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The  American  Council  on  Education 

Rating  Scale:  Its  Reliability, 

Validity,  and  Use 

I.  Foreword 
This  study  of  the  American  Council  Rating  Scale  has  three 
primary  aims: 

1.  To  summarize  briefly  the  tested  knowledge  concerning 
the  construction,  reliability,  validity,  and  uses  of  rating  scales 
as  a  personnel  tool. 

2.  To  furnish  those  who  may  use  the  particular  scale  in 
question  an  understanding  of  its  reliability  and  validity  under 
varied  types  of  normal  use. 

3.  To  offer  a  description  of  procedure  to  be  followed  in  con- 
structing and  testing  a  rating  scale  for  use  in  educational 
personnel  procedure. 


11.  History  of  Rating  Scales 

The  effort  to  judge  and  describe  individual  characteristics 
or  trait  differences  is,  of  course,  as  old  as  social  life  itself.  Its 
beginnings  are  clearly  discernible  in  the  early  writings  of 
the  Greeks,  and  they  have  continued  to  the  present  day.  In- 
sofar as  the  rating  scale  is  concerned,  the  history  of  such 
judgments  may  be  roughly  divided  into  two  periods:  before 
and  since  Galton.  Beginning  with  the  theory  of  the  humors 
as  the  basis  for  character  types,  most  characterological  work 
was  based  on  the  general  supposition  that  all  people  could  be 
separated  into  relatively  disparate  types,  and  the  effort  was 
made  to  find  out  how  many  types  there  were,  to  describe  them 
and  to  indicate  their  significance.  Words  in  common  use 
today  such  as  sanguine,  choleric,  melancholic,  and  phlegmatic 
are  language  relics  of  this  practice. 

Sir  Francis  Galton  (27)  (1869),  however,  in  his  effort  to 
demonstrate  the  inheritance  of  peculiar  eminence  was  forced 
to  find  some  reliable  comparative  standard  by  which  to  judge 
such  eminence.  In  the  preceding  years  the  astronomers  had 
discovered  that  individual  errors  in  time  observations  of 
astronomical  phenomena  were  grouped  in  a  rather  definite 
way  about  the  average  error.  This  grouping  had  been  studied 
statistically  and  had  developed  into  a  normal  probability  table 
of  distribution.  Galton  quotes  M.  Quetelet,  the  Astronomer- 
Royal  of  Belgium,  "and  the  greatest  authority  on  vital  and 
social  statistics."  The  biologists  had  picked  up  this  idea  and 
discovered  that  the  same  distribution  around  an  average  ap- 
peared in  biological  phenomena.  Galton  then  proceeded  to 
assume  its  appearance  in  social  phenomena  such  as  eminence 
and  based  his  degrees  of  eminence  on  the  normal  distribution 
curve. 

In  another  way,  too,  Galton  anticipated  the  most  recent 
practices  in  making  out  a  rating  scale.  In  his  studies  of 
sensitivity,  he  asked  the  subject  to  rate  his  own  degree  of 
sensation  on  a  scale,  each  degree  of  which  was  described  be- 
forehand by  Galton  in  vivid  and  illustrative  phraseology.  In 
this  way  the  subject  was  presented  with  a  series  of  descriptive 
terms  among  which  he  could  choose  the  one  most  nearly  repre- 
senting that  which  he  was  to  rate.    From  Galton,  then,  come 
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two  fundamental  assumptions  of  ratings:  first,  that  personal 
qualities  are  distributed  in  the  population  according  to  the 
frequencies  of  the  normal  distribution  curve,  and  that  equal 
intervals  on  the  scale  should  represent  equal  steps  in  frequency 
on  a  normal  curve;  and  second,  that  with  standard  descrip- 
tions of  such  qualities  arranged  in  linear  order,  a  rater  can 
give  an  accurate  judgment,  which  will  be  comparable  with  an- 
other rater's  judgment,  by  matching  his  own  experience 
against  that  term  which  appears  most  similar  to  it. 

Rugg,  (76)  in  his  elaborate  study  of  rating  published  in 
1921-22,  refers  to  a  suggestion  for  educational  ratings  made 
by  Elliot  in  1910.  The  data  mentioned  were  based  on  a  scheme 
of  100  traits  with  a  dozen  raters  rating  ten  teachers,  and  the 
reliability  correlations  were  +0.2  or  less.  The  next,  men- 
tioned also  by  Rugg,  is  a  scheme  attributed  to  Boyce  in  which 
45  qualities  were  rated  in  10  steps.  No  exact  data  as  to  the 
reliability  are  reported  by  Rugg.  Neither  of  these  scales  as 
described  by  Rugg  seems  to  have  had  the  refinement  of  defini- 
tion, construction,  and  methods  of  use  characteristic  of  Gal- 
ton's  work;  they  were  probably  composed  of  a  large  number 
of  adjectival  terms  undefined,  overlapping  and  vague  in  mean- 
ing, with  no  particular  basis  for  the  quantitative  values 
assigned. 

J.  B.  Miner,  (55)  then  at  Carnegie  Institute  of  Technology, 
read  a  paper  before  the  American  Psychological  Association 
in  December,  1916,  entitled  ''The  Evaluation  of  a  Method  for 
Finely  Graduated  Estimates  of  Abilities."  This  paper  is  very 
important  in  the  history  of  the  rating  scale  for  several  reasons. 
In  the  first  place,  it  followed  somewhat  Galton's  method  of 
frequencies  by  having  the  student  rated  as  to  whether  he  was 
in  the  lowest  fifth,  fourth  fifth,  middle  fifth  or  average,  second 
fifth,  or  highest  fifth.  In  the  second  place,  as  Dr.  Miner  points 
out: 

"One  feature  of  the  method  consisted  in  grading  the  person  by  means 
of  a  dot  placed  on  a  line.  This  plan  of  placing  a  dot  on  a  line  was  found 
in  a  blank  prepared  by  the  B.  F.  Clark  Teachers  Agency  of  Chicago.  A 
somewhat  similar  plan  with  five  divisions  from  0  to  100  without  defini- 
tion of  the  meaning  of  the  divisions  or  of  the  standard  group  was  tried 
at  one  time  by  the  Appointment  Committee  at  Teachers  College,  Colum- 
bia University.  The  method  was  adapted  to  our  purpose  by  substituting 
divisions  into  fifths  of  a  group  instead  of  the  divisions  'superlative,  ex- 
cellent, satisfactory,  fair,  and  poor'  which  were  used  on  the  teachers 
agency  blank.     Psychologically,  the  use  of  a  dot  on  a  line  seems  to  have 
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a  decided  advantage  over  the  percentage  method  which  it  most  closely 
resembles  in  that  it  gets  rid  of  the  habit  of  thinking  that  different  per- 
centages have  qualitative  significance  as  indicating  passing  or  excellent 
grades. 

"The  blank  embodies  four  fundamental  principles  for  securing  sys- 
tematic estimates  which  are  here  combined  for  the  first  time,  so  far  as 
I  know.  They  summarize  the  result  of  much  of  the  systematic  work 
which  has  been  done  in  this  field  in  recent  years.  (1)  The  person  is 
rated  relative  to  the  members  of  a  defined  group  which  is  known  by  the 
judges  and  is  used  as  a  standard.  In  our  case  the  average  senior  class 
in  the  students  course  and  school  was  used  as  this  standard.  (2)  All 
qualitative  terms  are  avoided  since  it  is  impossible  to  define  them  so 
that  they  call  up  the  same  idea  in  the  minds  of  different  judges.  In- 
stead, we  have  used  fifths  of  the  group,  a  concept  about  which  there 
should  be  no  difference  in  opinion  as  to  what  is  meant.  (3)  The  method 
allows  the  discrimination  to  be  made  as  finely  as  the  judge  desires  and 
yet  permits  the  investigator  to  determine  approximately  how  small  di- 
visions in  that  grading  have  sufficient  reliability  to  make  them  worth 
while.  The  results  on  this  phase  of  the  problem  will  be  discussed  later 
in  the  paper.  (4)  The  units  of  measurement  may  be  readily  transmuted 
into  equivalent  units  of  the  standard  deviation  on  the  basis  of  the  dis- 
tribution of  the  judgments.  In  our  blank  the  measurements  may  be 
made  in  millimeters  or  any  larger  portion  of  the  line  and  changed  into 
units  of  the  standard  deviation  by  Thorndike's  table." 

The  traits  to  be  rated  were  also  carefully  selected  from  a 
list  of  over  three  hundred  compiled  from  the  studies  by  Cat- 
tell,  Wells,  Yerkes  and  La  Rue,  Davenport's  Trait  Book, 
Mann's  Study  of  Engineers,  etc.  From  this  list  fifty  were 
selected  and  ranked  in  order  of  their  importance  by  some 
competent  judges  and  five  selected  which  seemed  to  represent 
different  important  factors  in  personality  from  the  point  of 
view  of  employment.  One  hundred  and  forty  seniors  were 
rated  by  four  judges  each,  altogether  about  seventy  judges  be- 
ing used,  and  the  results  were  studied  statistically  as  to  their 
reliability  and  their  validity.  Tables  show  the  correlation 
between  one  judgment  and  one,  two  judgments  and  two,  and 
the  inter-correlations  of  all  abilities.  Correlations  of  one 
judgment  with  one  were  obtained  as  high  as  +.65,  two  judg- 
ments and  two  as  high  as  -f  .79.  Inter-correlation  between 
traits  runs  as  high  as  +.85,  between  general  ability  and  com- 
mon sense.  The  combined  ratings  by  faculty  members  corre- 
lated with  the  combined  judgment  of  the  administrative  officers 
on  general  ability  in  one  school  +.75.  All  coefficients  are  for 
rank  orders  and  transmuting  "rho"  to  "r"  according  to  Pear- 
son's method. 
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This  paper  was  published  in  the  Journal  of  Applied  Psy- 
chology in  June,  1917.  Miner  mentions  the  fact  that  W.  D. 
Scott  had  suggested  a  method  by  which  the  rating  blank 
could  be  made  more  concrete,  if  comparison  could  be  made 
to  a  known  group  which  is  stable  enough,  so  that  typical 
individuals  may  be  selected  for  a  standard  scale,  and  that  in 
this  form  the  method  was  being  tried  out  by  the  Bureau  of 
Salesmanship  Research.  It  is  now  familiar  history  that,  when 
the  personnel  work  in  the  army  was  organized  during  the 
World  War,  Scott  added  to  its  equipment  the  celebrated  man- 
to-man  Scott  rating  scale  which  had  grown  out  of  these  experi- 
ments at  Carnegie  Institute  of  Technology;  and  this  particular 
scale  was  in  the  succeeding  years  tried  out  on  a  wide  scale  in 
the  army.  After  the  war  it  was  adapted  to  industry  under 
the  general  leadership  of  the  Scott  Company. 

After  a  brief  period  of  widespread  use  and  sudden  popular- 
ity, in  the  army  and  in  industry,  the  rating  scale  was  subjected 
to  a  very  serious  and  critical  study  by  Rugg  who  based  his 
study  on  the  data  obtained  from  the  Army  records  and  from 
his  own  use  of  a  similar  scale  constructed  by  himself  for  use 
in  the  schools.  Inasmuch  as  the  resulting  validities  and  re- 
liabilities, particularly  the  latter,  were  so  greatly  different 
from  those  found  by  Miner  it  may  be  well  to  point  out  the 
major  differences  between  the  two  scales.  In  the  first  place 
the  traits  used  were  different.  Miner  used  scholarship,  gen- 
eral ability,  common  sense,  energy,  initiative,  leadership,  and 
reliability.  The  Scott  scale  in  the  army  used  physical  quali- 
ties, intelligence,  leadership,  personal  qualities,  and  general 
value  to  the  service.  In  the  second  place  Miner's  scale  used 
just  the  single  word  as  the  only  description  of  the  trait.  (The 
Scott  scale  because  of  its  important  position  in  the  history 
of  rating  scales  is  reproduced  below  in  its  entirety.) 

Rating  Scale 
I.  Physical  Qiialities 

Physique,  bearing,  neatness,  voice,  en-      Highest    15 

ergy,  endurance.  High     12 

Consider  how  he  impresses  his  command      Middle    9 

in  these  respects.  Low    6 

Lowest 3 

II.  Intelligence 

Accuracy,  ease   in  learning;    ability  to      Highest    15 

grasp  quickly  the  point  of  view  of  com-      High     12 

manding  officer,  to  issue  clear  and  in-      Middle    9 

telligent    orders,    to    estimate    a     new      Low    6 

situation,   and   to   arrive   at   a   sensible      Lowest 3 

decision  in  a  crisis. 
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III.  Leadership 

Initiative,  force,  self  reliance,  decisive-      Highest    15 

ness,  tact,  ability  to  inspire  men  and  to      High     12 

command    their    obedience,    loyalty    and      Middle    9 

cooperation.  Low    6 

Lowest 3 

IV.  Personal  Qualities 

Industry,  dependability,  loyalty;   readi-  Highest    15 

ness   to   shoulder   responsibility   for   his  High     12 

own    acts;    freedom    from    conceit    and  Middle    9 

selfishness,  readiness  and  ability  to  co-  Low    6 

operate.  Lowest  3 

V.  General  Value  to  the  Service 

Professional    knowledge,    skill    and    ex-      Highest    40 

perience;   success  as  administi'ator  and      High     32 

instructor;  ability  to  get  results.  Middle    24 

Low    16 

Lowest 8 

It  is  noticeable  that  under  each  trait  name  is  assembled  a 
large  collection  of  words,  many  of  which  would  probably 
stimulate  different  conceptions  in  the  minds  of  the  raters. 
In  the  third  place  the  dot-on-line  feature  of  Miner's  scale  has 
been  abandoned ;  also  the  frequency  feature  and  for  the  latter 
has  been  substituted  the  "man-to-man"  criterion  which  Rugg 
found  was  rarely  alike  from  one  rater  to  another.  Lastly  the 
numerical  value  is  arbitrarily  assigned  to  each  position  on 
the  scale  and  this  differs  from  the  first  four  traits  to  the  fifth, 

A  specimen  of  the  reliability  findings  on  the  basis  of  which 
Rugg  indicted  the  use  of  the  man-to-man  scale  appears  in 
the  following  table  which  is  copied  from  his  report, 

TABLE  I.— RUGG 

Averages  and  Measures  of  Variability  of  6-31  Independent 

Ratings  of  15  Officers  in  a  Personnel  School  at  Fort  Sheridan 


Average 

No.  of 

No.  of 

Range  of 

His 

Deviation 

Officer 

Ratings 

Ratings 

Average 

of  Ratings 

Standard 

Probable 

Rated 

on  Him 

on  Him 

Rating 

on  Him 

Deviation 

Error 

1 

27 

52-80 

65.7 

6.1 

8.42 

5.67 

2 

23 

38-67 

52.9 

6.7 

8.11 

5.47 

3 

27 

66-92 

80.9 

6.4 

7.61 

5.13 

4 

30 

36-73 

53.5 

6.4 

8.50 

5.73 

5 

19 

53-81 

63.8 

5.4 

7.10 

4.79 

In  spite  of  the  fact  that  the  results  above  are  based  on  some 
six  to  thirty-one  independent  ratings,  many  more  than  Miner's 
experiment,  the  resulting  reliabilities  are  much  lower,  so  low 
that  Rugg  denies  that  there  is  much  chance  for  a  single  rating 
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to  place  an  indvidiual  within  the  proper  fifth  of  the  distribu- 
tion. Largely  as  a  result  of  Rugg's  thorough  study,  interest 
in  rating  scales  began  to  wane  from  the  date  of  his  publication. 

The  next  great  alteration  in  scale  construction  after  the 
Army  Scale,  appears  to  have  been  made  by  members  of 
the  Scott  Company  in  Philadelphia,  working  with  scales  in 
industrial  personnel  procedure. 

In  December,  1922,  Paterson  (58)  published  in  the  Journal 
of  Personnel  Research  an  elaborate  description  of  the  Scott 
Company  graphic  rating  scale.  This  was  followed  by  a 
thorough  discussion  of  the  same  scale  by  Freyd  (25)  in  1923 
in  the  Journal  of  Educational  Psychology.  Both  articles  give 
copies  of  the  scales  in  question.  A  specimen  of  the  scales 
presented  in  Paterson's  study  follows: 


12     AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 


(scale  b) 

GRAPHIC  RATING  REPORT  ON  WORKERS 


N«ffl«  of  Employe 


Branch  • 


Department 


Date 


Pocilioo  of  Employe  

Employ*  n»ted  By  

lostnictiont  for  Making  Out  Thit  Report:— R«t«  thi*  employe  on  tKe  b«ait  of  the  kctual  work  he  i«  now  doinx. 
Before  attemptins  to  report  on  thie  employe,  it  i»  nec*«««ry  to  h»ve  clearly  in  mind  the  eiact  qu*liti<i«  whicn 
tre  to  be  reported  on.  Reed  the  definition*  very  cerefully.  In  each  quality  compare  thia  employe  with  othert 
in  theiame  occupation  in  thia  company  or  elaewhere.  Place  a  checV  (•)  »om««rher«  on  the  line  runninafrom  "  vtty 
high"  to  "very  low"  to  indicate  this  employe'*  Standing  in  each  quality.  It  it  not  necesaary  to  put  the  check  (•) 
directly  Above  any  of  the  dcKnptive adjective*. 


QUALITIES 


I.  Abihty  to  Learn :  Conaider 
tha  ea««  with  which  thia  em- 
ploye ia  able  to  learn  new 
methoda  and  to  follow  direc- 
tioDj  given  him. 

II.  Quantify  of  Work:  Con- 
sider the  amount  of  work 
accomplithed  and  the  prompt- 
ness with  which  work  i«com- 
pleted. 

IIL  Quality  of  Work. -Coniider 
the  neatneaaand  occuracy  of 
hi«  work  and  hia  ability  con- 
stantly to  turn  out  work 
that  18  up  to  standard. 

IV.  Industry:  Consider  his  en- 
ergy and  application  to  the 
duties  of  hu  job  day  in  and 
day  out. 

V.  Initiative:  Consider  his 
success  in  going  ahead  with 
a  job  without  being  told 
every  detail;  his  ability  to 
make  practical  auggestions 
for  doing  things  in  a  new 
•nd  better  way. 

VI.  Co-operativeness:  Consid- 
er his  success  in  effectively 
co-operating  with  his  co- 
workers ana  with  thoae  exer- 
cising greater  authority. 


VII.  KnowledgeofWork:  Con- 
sider present  knowledge  of 
job  and  of  work  related  toil. 


REPORT 


Very 
Superior 


Learns 
With  Ease 


Ordinary 


Slow  To 
Learn 


DuU 


Unusually       Satisfactory 
High  Output        Output 


Limited    Unsatisfsctoiy 
Output  Output 


Highest 
Quality 


Good 
Quality 


Careless 


Makes  Many 
Errors 


Very 
Energetic 


Industrious 


Indifferent 


La  ay 


Very 
Original 


Resourceful 


Occasionally 
Suggests 


Routina 
Worker 


Needs 

Constant 

Supervision 


Highly        Co-operative 
Co-ope  rativa 


Difhcull   Obstructionist 

to  Handle 


Complete 


Well 
Informed 


Moderate        Meagre 


Lacking 


REMARKS:  (See  Reverse  Side  for  Suggestions)- 


Total- 


Final 
Rating- 


Teco3042  11-20 
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Reverse  Side  of  One  of  the  Rating  Scales  for  Workers 

(scale  b) 

Graphic  Rating  Report  on  Workers 

The  Purpose  of  Periodic  Rating  Reports 

1.  The  graphic  rating  report  is  a  practical  method  by  means  of  which 
each  employe's  ability  and  fitness  for  promotion  can  be  known  quickly, 
with  a  reasonable  degree  of  accuracy  and  with  uniformity  throughout 
the  company. 

2.  The  ratings  are  converted  into  a  numerical  expression  indicating 
the  ability  of  each  person  in  those  qualities  deemed  most  essential,  such 
as  ability  to  learn  new  methods,  quantity  of  work,  quality  of  work,  in- 
dustry, initiative,  co-operativeness,  and  knowledge  of  work. 

3.  Because  the  Rating  Report  calls  attention  separately  to  each  of 
these  essential  qualifications,  it  lessens  the  danger  that  opinions  will  be 
based  on  minor  points,  with  a  corresponding  disregard  of  important  qual- 
ities. It  is  to  the  interest  of  all  concerned  to  replace  snap  judgments  by 
carefully  thought-out  reports. 

4.  This  rating  report  has  been  devised  after  careful  consideration  of 
the  best  practices  throughout  the  country.  Its  chief  claim  for  the  sup- 
port of  the  supervisor  and  the  employe  is  the  fact  that  it  is  simple,  con- 
crete and  definite.  It  reduces  the  time  required  to  rate  an  employe  to  a 
minimum,  yet  it  is  so  arranged  that  the  interests  of  each  employe  are 
safeguarded  as  regards  accuracy  and  fairness. 

5.  All  rating  reports  are  confidential.  Any  employe  who  is  rated, 
however,  may  be  told  where  he  stands  in  order  that  he  may  improve  him- 
self if  he  so  desires. 

To  Supervisors:    Supplement  Your  Rating  With  Appropriate  Remarks 

When  you  have  completed  your  rating  of  the  employe  on  the  front  of 
this  report,  enter  under  Remarks  any  comments  which  are  appropriate. 

In  doing  so,  consider  the  possible  comments  suggested  here  and  write 
the  numbers  of  any  comments  that  are  particularly  pertinent. 

1.  Recommend  that  Personnel  Department  interview  this  employe  to 
advise  him 

(a)  How  he  can  improve  himself. 

(b)  Concerning  his  present  and  future  opportunities. 

2.  Deserves  promotion. 

3.  Desires  transfer  to  other  work. 

4.  Well  liked  by  fellow-employes. 

5.  Would  do  well  in  a  supervisory  position. 

6.  Is  handicapped  physically  as  follows 

7.  Is  taking  a  course  in 
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This  scale  is  obviouslj^  related  to  the  one  described  by  Miner, 
as  well  as  to  the  army  scale.  The  man-to-man  feature  has  been 
dropped.  The  line  has  reappeared,  this  time  without  any 
divisions  indicating  fifth  of  the  group.  The  general  descrip- 
tive terms  such  as  very  superior,  good  quality,  etc.,  have  re- 
appeared. The  trait  descriptions  are  here  instead  of  the  use 
of  a  simple  trait  label,  but  are  more  sharply  defined  and  seem 
to  include  fewer  unrelated  modes  of  behavior.  A  scoring 
stencil  is  to  be  used  as  a  new  feature  so  that  the  rater  does  not 
himself  realize  what  score  he  is  giving  the  individual.  This 
stencil  is  divided  into  ten  divisions  numbered  one  to  ten.  The 
notion  of  frequency  has  clearly  disappeared,  but  in  scoring, 
the  final  score  is  a  letter  grade  A,  B,  C,  and  so  on,  and  this  is 
awarded  on  the  basis  of  the  percentage  receiving  each  rating. 
The  highest  ten  percent  gets  letter  A,  the  next  twenty  per- 
cent B,  the  next  forty  percent  C,  twenty,  D  and  the  lowest 
ten  E.  It  is  'probably  not  too  much  to  say  that  this  scale  is 
Galton's  scheme  with  the  addition  of  the  line  as  in  Miner's 
scale  minus  the  cross  divisions  of  the  line  which  might  prevent 
the  rater  from  feeling  the  linear  dimension  of  the  quality 
which  he  was  rating. 

It  is  reported  that  foremen's  ratings  on  their  workers,  using 
the  "Graphic"  Scale,  correlate  between  a  first  and  second 
rating  a  month  apart  as  highly  as  +.91  and  the  average  cor- 
relation for  the  first  and  second  ratings  of  several  foremen  is 
+  .76.  It  is  clear  that  in  some  way  we  have  here  results  very 
different  from  those  obtained  in  the  army  as  studied  by  Rugg. 
The  reliability  of  ratings  improved  as  the  foremen  became 
accustomed  to  rating.  Differences  in  reliability  were  dis- 
covered as  characteristic  of  some  foremen.  It  should  be  noted 
too  that  one  feature  of  this  scale  was  the  appearance  on  the 
reverse  side  of  very  clear  cut  and  pertinent  directions  to  raters 
as  to  how  to  do  the  rating. 

It  is  interesting  to  note  at  this  point  that  although  the  use 
of  the  rating  scale  began  apparently  with  Galton  for  the  pur- 
poses of  pure  science  and  began  in  this  country  with  the  work 
by  Elliott,  mentioned  by  Rugg,  for  the  purposes  of  educational 
guidance,  yet  it  was  the  sustained  demand  for  better  per- 
formance procedures  and  quantitative  checks  on  judgment  as 
voiced  by  Hollingworth  and  Poffenberger  and  Scott  and  their 
work  with  employment  procedures,  that  really  was  the  sane 
force  developing  rating  scales. 
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After  the  work  of  the  Scott  Company,  however,  the  system- 
atic experimentation  with  scales  passes  almost  entirely  into 
the  hands  of  psychologists  and  workers  in  the  field  of  educa- 
tional guidance. 

In  February,  1921,  a  paper  was  written  by  Floyd  H.  Allport 
and  Gordon  W.  Allport  (2)  entitled  "Personality  Traits;  Their 
Classification  and  Measurement."  This  study  was  a  serious 
effort  to  assemble  experimental  data  on  the  measurement  of 
personality  traits,  to  organize  that  data  in  terms  of  theoreti- 
cal psychology  and  to  suggest  methods  of  continued  study  by 
the  use  of  ratings  and  questionnaires.  The  work  is  character- 
ized by  the  presence  of  traits  which  would  have  more  signif- 
icance for  understanding  the  motivation  and  emotional  make- 
up of  the  individual  as  contrasted  with  previous  work  dealing 
with  his  work  habits  and  vocational  fitness.  The  scale  pre- 
sented (which  leaves  out  the  graphic  feature,  although  this 
was  knoMoi  to  the  authors)  is  essentially  a  three  step  scale, 
and  the  description  combines  illustrative  material  with  a 
notion  of  frequency  in  three  divisions  with  average,  of  course, 
as  the  center.  No  data  are  presented  on  reliabilities  or  validi- 
ties. An  excellent  list  of  selected  references,  representing 
several  points  of  view,  is  included. 

In  September,  1921,  G.  W.  Allport  (3)  published  in  the  Psy- 
chological Bulletin  a  general  summary  entitled  "Personality 
and  Character."  This  should  be  consulted  by  anyone  wishing 
to  study  the  systematic  development  which  underlies  the  con- 
cept of  what  a  personality  trait  is.  The  section  on  ratings 
summarizes  very  briefly  the  work  up  to  that  point  of  various 
students  of  ratings  and  suggests  the  following  conclusions  as 
justified  by  previous  work: 

1.  The  reliability  of  rating  varies  with  the  traits  under 
consideration.  Raters  agree  more  closely  upon  such  qualities 
as  popularity,  conceit  or  leadership  than  upon  emotionality, 
honesty,  or  tact. 

2.  Some  individuals  are  easier  to  rate  than  others. 

3.  Traits  most  easily  rated  represent  the  individual's  re- 
actions to  objective  things.  Other  traits  more  difficult  repre- 
sent individuals'  reactions  toward  people. 

4.  A  "halo"  effect  produces  unduly  high  correlations  be- 
tween traits. 

5.  A  final  rating  must  be  the  average  of  three  independent 
ratings. 
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6.  Scales  on  which  ratings  are  made  must  be  comparable 
and  equivalent. 

7.  Raters  must  be  thoroughly  acquainted  with  the  person 
rated. 

8.  In  self  rating,  the  individual  tends  to  over-estimate  his 
possession  of  qualities  that  are  socially  valuable  and  to  under- 
estimate the  possession  of  those  which  are  socially  undesirable. 
Allport  believes  that,  "Notwithstanding  the  dangers  and  dif- 
ficulties encountered  in  devising  and  employing  rating  scales, 
we  are  forced  to  recognize  this  method  as  the  only  available 
objective  criterion  of  personality.  The  sources  of  error  must 
be  gradually  overcome  by  the  improvement  in  the  technic  of 
rating." 

A  fairly  comprehensive  bibliography  of  work  on  personality 
scales  and  tests  is  appended. 

In  1924,  Symonds  (85)  published  a  survey  entitled  "The 
Present  Status  of  Character  Measurement."  In  the  opening 
paragraph  he  notes  that  in  1921  many  contributors  to  a  sym- 
posium on  intelligence  and  its  measurement  stated  that  one  of 
the  next  steps  in  research  was  the  development  of  the  meas- 
urement of  character.  Colvin,  Pintner,  Pressey,  Terman,  and 
Thurstone  are  all  quoted  as  saying  in  general  that  the  time 
is  now  ripe  for  a  more  active  investigation  of  the  emotions,  the 
character  and  traits  of  personality.  Symonds  bases  his  re- 
view on  the  work  by  G.  W.  Allport  previously  quoted,  and 
notes  that  the  literature  to  date  reveals  eight  different  meth- 
ods in  somewhat  definite  form  for  studying  character.  These 
are: 

1.  Habit  Scales. 

2.  Character   Scales. 

3.  Self  Assurance  or  overstatement  test. 

4.  A  specific  test  of  trustworthiness  known  as  the  squares 
and  circle  test. 

5.  A  specific  test  of  trustworthiness  known  as  the  Parafin 
Completion  Test. 

6.  Speed  of  decision  test. 

7.  The  questionnaire. 

8.  Ethical  judgment  test. 

Of  particular  interest  from  the  point  of  view  of  rating  scale 
development  is  the  discussion  of  the  Upton-Chassell  scale  for 
measuring  the  importance  of  good  citizenship  published  in 
1919  which  seems  to  Symonds  to  be  the  first  scale  attempting 
to  give  a  rating  scheme  for  conduct  habits  as  opposed  to  traits. 
Other  habit  scales  are  noted.     As  Symonds  points  out,  the 
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distinction  is  that  habits  are  dynamic  whereas  traits  are  static. 
At  this  point,  then,  the  student  of  rating  scales  becomes  con- 
scious of  the  possibilities  for  rating  actions  rather  than  infer- 
ring the  possession  of  certain  degrees  of  traits. 

Significant  characteristics  of  rating  technic  included  in 
Symond's  summary  not  appearing  previously  seem  to  be  the 
following:  reliability  coefficients  as  high  as  .97  on  social  ad- 
justment traits  (Porteus  66)  ;  a  higher  reliability  of  a  rating 
in  which  the  rater  is  very  sure  of  his  judgment  (Cady  10)  ; 
increase  of  reliability  through  a  period  of  preliminary  ob- 
servation; decrease  in  accuracy  through  extended  acquaint- 
ance (Knight  45).  Symonds  suggests  in  his  conclusion  that 
we  need  very  badly  a  scale  of  generalness  of  conduct  habits 
determined  as  accurately  and  scientifically  as  possible  in  order 
that  we  might  discover  what  is  the  relationship  between  gen- 
eralness of  a  habit  and  reliability  of  ratings  on  that  habit.  This 
is  an  interesting  suggestion  because  it  reflects  a  further  bit 
of  careful  thinking  concerning  the  real  nature  of  these  traits 
that  are  being  rated,  the  scepticism  as  to  the  extent  to  which 
there  are  such  things  as  general  traits  rather  than  specific 
habits. 

The  psychological  processes  involved  in  making  the  judg- 
ment on  which  ratings  are  based  was  thoroughly  discussed  in 
1922  by  Hollingworth's  (36)  Judging  Human  Character.  As 
especially  pertinent  to  the  use  of  rating  scales,  he  too  notes 
the  existence  of  the  halo  effect,  the  actual  intercorrelation  be- 
teen  many  of  the  traits  used  because  of  the  overlapping  of 
those  traits,  the  "central  tendency  of  judgment,"  meaning  our 
tendency  to  under-rate  extremely  high  and  over-rate  ex- 
tremely low  individuals,  and  the  other  difficulties  involved  in 
the  use  of  the  scale.  The  scale  discussed  in  this  book  is  of 
essentially  the  same  form  as  Miner's.  Hollingworth  makes  a 
suggestion  which  is  apparently  made  by  him  for  the  first  time 
in  the  literature  of  the  rating  scale ;  namely,  that  raters  should 
furnish  with  the  rating  what  he  calls  a  "narration  of  in- 
stances" or  "record  of  facts." 

Laird's  Psychology  of  Selecting  Men  (48)  and  Bingham 
and  Freyd's  Procedures  in  Employment  Psychology  (7),  pub- 
lished respectively  in  1925  and  1926,  both  bring  into  the  in- 
dustrial use  of  the  rating  scale  the  products  of  research  in  the 
colleges.  Both  present  the  graphic  rating  scale  in  somewhat 
the  same  form  as  that  used  by  the  Scott  Company  as  the  most 
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improved  form  up  to  date.  Bingham  and  Freyd  include  a 
sample  scale  in  which  the  trait  descriptions  at  the  left  are  in 
the  form  of  questions  rather  than  descriptions  of  the  traits 
themselves,  and  sometimes  these  questions  are  concerned  with 
the  habits  of  the  individual  to  be  rated.  Underneath  the  lines 
are  terms,  some  vague  and  some  fairly  definite.  The  scale 
seems  to  occupy  an  intermediate  position  between  the  typical 
graphic  scale  as  first  developed  and  later  developments  from  it. 

In  addition  to  the  preceding  systematic  studies  of  the  gen- 
eral problem  of  measuring  and  rating  character,  two  fairly 
recent  experimental  studies  of  rating  scale  technique  are  of 
interest.  Marsh  and  Perrin  (51)  in  1924-25  published  an  ex- 
perimental study  of  rating  scale  technique  in  which  84  sub- 
jects consisting  of  48  women  and  36  men  were  rated,  using 
a  group  of  15  raters.  Three  scale  forms  were  used :  graphic, 
percentage,  and  man-to-man.  Each  scale  had  five  gradations. 
The  traits  included  were  the  following:  Size  of  Head,  Tend- 
ency to  Laugh,  Care  and  Attention  to  Hair,  Skin,  Lips,  Eyes, 
Nose,  and  Hands,  Voice,  Distinctness  of  Articulation,  Mobil- 
ity of  Facial  Expression,  Physical  Attractiveness,  Poise  and 
Self-Control,  Gracefulness,  Emotional  Attitude  in  the  Labora- 
tory Room,  Efficiency  in  the  Card  Sorting  Test,  Efficiency  in 
the  Aiming  Test,  General  Intelligence  During  the  Laboratory 
Period,  Ability  in  Leadership.  Copies  of  the  scale  are  not 
provided  so  it  is  difficult  to  judge  the  methods  of  constructing 
the  various  scales.  The  authors'  conclusions  indicate  that  no 
one  form  of  scale  demonstrates  superiority  over  the  others. 
The  ratings  approximate  normal  distributions.  Considerable 
reliability  was  disclosed.  Correlations  between  traits  which 
would  logically  be  correlated  ran  as  high  as  .79,  between 
gracefulness  and  physical  attractiveness. 

Kornhauser  (47)  published  in  the  Journal  of  Personnel 
Research  in  1926-27  a  series  of  very  careful  studies  of  ratings. 
The  scale  used  is  called  a  graphic  scale  and  has  five  intervals 
with  each  interval  separated  from  the  others  by  a  vertical  line. 
This  scale  goes  back  again  toward  the  Miner  type  of  scale.  The 
traits  are  merely  labeled  in  the  left-hand  margin,  with  a  three- 
or  four-word  description  of  each  degree,  using  such  general 
terms  as  fairly,  moderately,  well,  etc.  The  resulting  ratings 
are  well  distributed  somewhat  in  the  normal  distribution 
curve.  The  traits  used  are  intelligence,  industry,  accuracy, 
cooperativeness,   initiative,   trustworthiness,   and   leadership 
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ability.  Coefficients  of  reliability  between  the  two  members 
of  a  pair  of  instructors  are  surprisingly  high,  averaging  for 
all  traits  and  all  instructors  +.41  but  ranging  as  high  as 
+  .82  between  the  ratings  on  accuracy  by  one  pair  of  instruc- 
tors. The  trait  most  accurately  rated  was  industry  with  an 
average  correlation  of  +.53.  The  preceding  figures  are  for 
cases  where  more  than  twelve  students  were  rated  by  the 
same  instructois.  The  number  of  cases  rated  by  each  pair 
of  instructors  varies.  It  is  very  significant  that  the  reliabili- 
ties of  ratings  by  the  same  instructor  at  different  times  are 
considerably  greater,  averaging  for  intelligence,  for  example, 
+  .78.  Inter-correlations  reported  are  high  but  not  varying 
significantly  from  previous  studies  of  such  inter-correlations. 
Correlations  between  grades  and  ratings  in  the  case  of  fifty 
students  ranged  from  +.83  between  intelligence  and  grades  to 
+  .58  between  moral  trustworthiness  and  grades  and  +.81  for 
total  rating  in  all  traits  and  grades.  There  is  no  indication 
that  these  coefficients  have  been  corrected  for  attenuation. 
Dr.  Kornhauser  concludes  that  on  purely  statistical  grounds 
it  is  apparent  that  the  estimates  are  not  wholly  worthless 
and  that  ratings  can  be  greatly  improved  through  changes  in 
procedure  and  instruction  to  raters.  In  view  of  the  fact  that 
the  construction  of  the  scale  did  not  include  some  of  the  most 
recent  developments,  these  figures  would  indicate  definite 
achievement  in  reliabilities  and  validities. 

A  more  general  review  of  rating  scales,  is  that  by  Watson 
(97)  in  1927.  The  only  general  finding  concerning  methods  of 
ratings  appearing  in  the  review  not  previously  mentioned  in 
this  summary  are  the  following:  Ratings  become  more  reliable 
when  a  general  trait  is  broken  into  a  number  of  specific  fac- 
tors (26)  Furfey;  Raters  are  frequently  unable  to  justify  rat- 
ings or  apt  to  give  absurd  rationalizations ;  this  does  not, 
however,  indicate  anything  about  the  reliability  of  the  ratings 
(49)  Landis. 

May  and  Hartshorne,  "Recent  Improvements  in  Devices 
for  Rating  Character,"  submitted  for  publication  April  1, 
1929,  and  published  in  the  February,  1930,  number  of  the  Joui^- 
nal  of  Social  Psychology,  have  brought  the  general  survey  up- 
to-date,  noting  that  the  greatest  single  improvement  is  the 
development  of  what  is  called  a  conduct  scale.  The  particular 
scale  to  which  these  authors  refer  is  that  used  by  the  Charac- 
ter Education  Inquiry.     Each  item  on  the  scale  represents 
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observable  modes  of  conduct.  The  judge  is  not  asked  to  place 
the  child  on  a  scale  nor  to  give  an  opinion  concerning  the 
amount  of  the  trait  in  question  but  simply  to  give  a  judgment 
of  fact  concerning  his  behavior  tendencies.  The  sample  sub- 
mitted is  the  following: 

Cooperation : 

A.  Works  with  others  if  asked  to  do  so. 

B.  Works  better  alone.    Can  not  get  along  with  others. 

C.  Works  well  and  gladly  with  others. 

D.  Indifferent  as  to  whether  or  not  he  works  with  others. 

E.  Usually  antagonistic  or  obstructive  to  joint  effort. 

Dr.  Yepsen,  who  has  developed  this  scale,  reports  a  re- 
liability of  +.77,  and  states  that  the  scale  will  satisfactorily 
discriminate  between  children  who  offer  the  greatest  and  the 
least  social  maladjustments. 

The  recent  history  of  the  rating  scale  as  a  device  for  measur- 
ing and  recording  personality  traits  or  modes  of  behavior  may 
well  be  summarized  in  the  following  paragraph  from  this 
article  by  May  and  Hartshorne : 

For  a  while  it  seemed  that  rating  scales  as  scientific  instruments 
would  be  completely  discarded.  It  was  necessity  that  saved  the  day. 
While  everyone  talked  about  the  superiority  of  objective  tests,  yet  it  was 
soon  found  that  many  qualities  of  character  yield  only  stubbornly  and 
expensively  to  objective  testing.  If  character  and  personality  studies 
were  to  continue,  ratings  had  to  be  revived.  In  spite  of  their  difficulties, 
snares,  delusions,  and  pitfalls  they  are  now  staging  a  considerable 
"comeback." 

May  and  Hartshorne  list  also  extreme  modifications  of  the 
rating  scale,  such  as  check  lists  in  which  teachers  check  de- 
scriptive adjectives  instead  of  phrases  and  sentences,  portrait 
matching  in  which  teachers  decide  which  of  several  word 
portraits  best  describes  the  individual  child,  and  a  "guess 
who"  test  in  which  the  pupils  are  asked  to  guess  which  child 
is  represented  by  a  word  portrait.  Thus  Hollingworth's  sug- 
gestion of  narration  of  fact  and  supporting  instances  and 
Symond's  suggestion  of  habit  scales  begin  definitely  to  enter 
the  minds  of  those  constructing  rating  procedures. 

A  summary  of  general  principles  involved  in  making  and 
using  scales  follows.  This  summary  has  been  taken  from 
many  sources,  principally  Watson,  Paterson,  and  Freyd. 
Where  responsibility  for  the  suggestion  can  be  fixed  it  is  so 
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indicated.  Some  not  so  indicated  have  no  doubt  been  made  in 
similar  form  before.  The  classification  is  loose  but  helps 
organize  the  more  or  less  unrelated  ideas. 

/.  Traits. 

1.  Traits  differ  in  the  success  with  which  they  can  be  rated. 
In  general  it  seems  desirable  that  ratings  be  based  upon  past 
or  present  accomplishment,  that  they  be  as  objective  as  possi- 
ble, that  they  be  stated  unambiguously  and  specifically  (Pater- 
son  and  Kingsbury.) 

2.  Ratings  become  more  reliable  when  a  general  trait  (e.g., 
developmental  age)  is  broken  into  a  number  (18)  of  specific 
factors  (Furfey  26). 

3.  "General  all  around  value"  is  frequently  more  reliably 
rated  than  are  some  of  the  more  specific  qualities  involved 
(Rugg  and  Slawson  82). 

4.  Be  sure  the  trait  is  not  a  composite  of  several  that  vary 
independently  (Freyd). 

5.  Each  quality  should  refer  to  a  single  type  of  activity  or 
to  the  results  of  a  single  type  of  activity  (Paterson). 

6.  Do  not  use  scales  to  rate  traits  on  which  other  more  de- 
pendable data  can  be  obtained. 

//.  Raters 

7.  Self-ratings  tend  to  be  too  high  on  desirable  traits  and 
too  low  on  undesirable  traits.  They  tend,  however,  to  place 
the  strong  and  weak  points  of  the  individual  in  their  general 
positions.  One  tends  to  rate  one's  own  sex  higher  than  the 
opposite  sex  on  desirable  traits,  the  reverse  being  true  of 
undesirable  traits  (Knight  44,  Franzen  44,  Kinder,  and  Shen 
79). 

8.  Raters  are  frequently  unable  to  justify  ratings,  or  are 
apt  to  give  absurd  rationalizations.  This  does  not,  however, 
indicate  anything  about  the  reliability  of  the  rating  (Landis 
49). 

9.  Ratings  of  which  the  rater  expresses  himself  as  "very 
sure"  are  markedly  more  reliable  than  are  ordinary  ratings 
(Cady  10). 

10.  There  is  some  evidence  that  immediate  emotional  reac- 
tions affect  ratings  made  upon  the  "scale  of  values"  method 
more  than  they  do  ratings  made  when  subjects  are  ranked  in 
order  of  merit  (Conklin  and  Sutherland  18). 
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11.  The  average  or  median  rating  of  a  number  of  judges 
is  superior  to  that  of  a  single  judge,  provided  there  are  not 
great  differences  in  the  capability  of  the  judges  (Rugg,  Pater- 
son  and  Gordon). 

12.  People  differ  markedly  in  their  ability  to  make  ratings 
(Norsworthy  56,  Rugg  and  Paterson). 

13.  Record  and  use  only  those  ratings  from  raters  of  proved 
reliability  (Paterson). 

14.  People  who  are  good  judges  of  themselves  tend  to  be 
good  judges  of  others. 

///.    Rating 

15.  While  close  associates  are  likely  to  rate  more  reliably 
than  are  casual  associates,  long  and  intimate  friendships  bring 
marked  decreases  in  the  reliability  of  ratings.  Persons  tend 
to  over-rate  intimate  friends  on  desirable  traits  and  under- 
rate on  less  desirable  traits  (Knight  and  Shen). 

16.  Judges  who  have  been  asked  to  observe  for  several 
months,  preparatory  to  rating,  presumably  give  better  ratings 
than  do  judges  whose  observation  has  been  more  or  less  casual 
(Webb  98). 

17.  Raters  should  rate  exclusively  on  the  basis  of  past  or 
present  conduct  (Paterson). 

18.  Ratings  obtained  in  advance  of  any  special  situation 
necessitating  their  use  are  more  likely  to  be  accurate  (Pater- 
son). 

19.  Raters  having  one  form  of  contact  with  the  individual 
being  rated  (teachers  of  the  same  school  subject),  tend  to 
agree  more  closely  than  do  raters  with  more  diversified  con- 
tacts. By  the  same  token,  ratings  obtained  from  persons  hav- 
ing predominantly  one  type  of  contact  are  much  less  useful 
outside  of  that  specific  field   (Hanna  31). 

20.  Raters  should  be  carefully  trained,  by  discussing  the 
distribution  of  abilities,  by  describing  the  scale,  and  caution- 
ing against  constant  errors  such  as  halo  effect  and  central 
tendency,  and  prejudice. 

21.  One  trait  should  be  rated  through  the  entire  group  of 
subjects,  rather  than  permitting  the  rating  of  one  subject 
through  the  entire  group  of  traits  (Symonds  and  Paterson). 

22.  People  differ  in  their  reliability  as  subjects  for  ratings. 
Some  are  easier  to  rate  than  others.     It  appears  that  poor 


AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE     23 

employees  tend  to  be  better  analyzed  than  are   good   ones 
(Norsworthy,  Rugg  and  Kingsbury  43). 

IV.  Scale  Construction. 

23.  Rating  scales  to  be  used  in  ordinary  situations,  should 
be  simply  stated,  and  capable  of  being  used  easily  (Paterson). 

24.  Qualities  should  be  grouped  according  to  the  accuracy 
with  which  they  can  be  rated  (Paterson). 

25.  It  is  desirable  to  have  traits  defined.  This  definition 
should  be  as  simple  as  possible,  but  unambiguous,  definite,  ob- 
jective (Paterson). 

26.  Avoid  genex^al  terms,  such  as :  very,  extremely,  average, 
excellent  (Freyd). 

27.  Statistically  considered,  seven  seems  to  be  the  optimum 
number  of  intervals  for  scaling  behavior  (Symonds). 

28.  There  is  no  significant  difference  between  the  results 
obtained  by  scales  which  demand  that  the  rater  shall  rank 
the  subjects  in  order  of  merit,  and  scales  which  provide  a  range 
of  values  which  may  be  assigned  each  person.  The  latter  is 
more  congenial  to  most  raters  (Symonds). 

29.  A  graphic  scale  which  gives  one  sheet  for  each  trait, 
indicating  over  each  of  the  five  or  seven  sections  of  the  line- 
graph  the  approximate  number  or  per  cent  of  the  group  who 
should  be  given  ratings  in  that  general  vicinity,  tends  toward 
a  more  widespread  and  normal  series  of  ratings  (Symonds). 

30.  The  graphic  rating  scale,  in  which  the  rater  places  a 
check  upon  a  line  rather  than  using  statistical  terms,  has  ad- 
vantage in  permitting  fine  discriminations  and  in  being  con- 
genial to  raters.  Adjectives  are  usually  placed  along  the  line 
to  indicate  the  meaning  of  sections  of  the  line.  Such  scales 
should  be  at  least  five  inches  long,  no  breaks  or  divisions 
should  be  made  in  the  line,  the  extremes  and  one  to  three  other 
points  should  be  defined  in  terms  of  universally  understood 
words  which  are  not  too  general  in  scope,  and  the  favorable 
extremes  should  be  alternated  to  correct  the  motor  tendency 
(Freyd). 

31.  The  line  should  not  be  much  more  than  five  inches  long, 
otherwise  it  cannot  be  easily  grasped  as  a  whole  (Freyd). 

32.  Decide  definitely  on  extremes  of  ability  probably  occur- 
ring among  persons  to  be  rated  (Freyd). 

33.  Have  extreme  phrases  set  flush  with  end  of  line 
(Freyd). 
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34.  End  phrases  should  not  be  so  extreme  as  to  be  avoided 
by  raters  (Freyd). 

35.  Place  average  or  neutral  term  in  center  (Freyd). 

36.  Make  descriptive  phrases  closer  in  meaning  to  central 
one  than  to  next  outer  (Freyd) . 

37.  The  scale  should,  ordinarily,  yield  a  normal  distribution. 
If  it  does  not,  this  may  be  statistically  corrected.  In- 
dividuals who  rate  constantly  low  or  high  should  have  their 
ratings  corrected  (Freyd,  Kelley,  T.  L.  41  and  Paterson). 

38.  There  is  a  tendency  to  skew  the  rating  of  every  specific 
trait  in  the  direction  of  the  total  reaction  of  the  rater  to  sub- 
ject. This  is  the  well-authenticated  "halo  effect."  (Thorndike, 
89,  Rugg,  Hollingworth,  Knight  and  Franzen.) 

39.  Study  reliabilities  and  validities  and  retain  only  a  short 
scale  of  reliable  and  valid  items  (Freyd). 


III.  Origin  of  This  Study 

In  July,  1927,  the  American  Council  on  Education  created 
a  committee  on  Personnel  Procedure.  Working  under  a  gen- 
eral or  executive  committee  there  were  four  subcommittees, 
each  presented  with  a  specific  task  in  developing  improved 
tools  for  personnel  procedure.  Committee  I  was  to  construct 
a  cumulative  personal  record  form  which  could  be  a  basis  for 
student  guidance,  student  accounting,  and  personnel  research. 
Committee  II  was  to  construct  and  publish  objective  tests  of 
educational  achievement  to  supplement  existing  tests  of  men- 
tal alertness,  the  scores  from  both  to  feed  comparable  and  de- 
pendable information  into  the  cumulative  record.  Committee 
III  was  to  study  thoroughly  the  rating  scale  as  a  device  for  se- 
curing and  standardizing  data  about  personality  traits  not  spe- 
cifically revealed  by  test  scores  and  other  data  on  the  personal 
record.  Committee  IV  was  to  prepare  model  forms  for  oc- 
cupational studies  and  monographs  which  might  be  an  ade- 
quate tool  for  vocational  and  educational  guidance  of  college 
students.  Later  a  fifth  committee  was  added.  Committee  V, 
to  make  a  study  of  techniques — rather  than  tools — involved  in 
personality  development  of  college  students. 

Committee  III  (on  rating  scales)  met  at  West  Point  in 
July,  1927,  in  conjunction  with  all  the  other  committees.  Dr. 
David  A.  Robertson,  Associate  Director  of  the  American  Coun- 
cil on  Education,  and  formerly  Professor  of  English  and  Dean 
of  the  College  of  Arts  and  Sciences  of  the  University  of  Chi- 
cago, was  appointed  Chairman.  The  other  members  of  the 
committee  were  Donald  G.  Paterson  of  the  psychology  depart- 
ment of  the  University  of  Minnesota,  Edward  K.  Strong,  Jr., 
of  the  psychology  department  of  Stanford  University,  Grace 
E.  Manson,  of  the  personnel  department  of  Michigan  Univer- 
sity, and  Francis  F.  Bradshaw,  dean  of  students  at  the  Uni- 
versity of  North  Carolina.  Messrs.  Paterson  and  Strong  had 
in  the  recent  past  done  expert  work  on  rating  scales  and  other 
devices  intended  to  provide  objective  measures  of  personality 
traits.  Dr.  Manson  had  recently  compiled  a  comprehensive 
bibliography  on  personality.  The  other  two  committee  mem- 
bers had  administrative  experience  in  using  such  tools  in  the 
effort  to  understand  and  guide  college  students. 
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Out  of  these  various  backgrounds  of  experience  and  com- 
petence during  the  two-day  session  the  committee  formulated 
several  tentative  bases  for  future  study,  experimentation,  and 
discussion.  The  following  is  taken  from  the  published  min- 
utes. 

Personnel  work  demands,  in  addition  to  ability  and  aptitude  tests, 
estimates  and  measurements  of  personality  traits;  .  .  .  that  it  recog- 
nizes rating  scale  techniques  as  provisional,  pending  development  of 
objective  measurements;  that  meantime  sufficient  progress  in  measuring 
certain  personality  traits  has  been  made  to  warrant  trial  at  the  present 
time;  that,  in  view  of  the  small  number  of  valid  tests  of  personality 
traits,  it  recognize  that  rating  scales  would  be  necessary  for  some  time 
to  come.  The  committee  suggested  certain  principles  to  safeguard  and 
improve  rating  procedures:  (1)  Only  traits  observed  by  the  rater  should 
be  measured.  (2)  Only  those  traits  for  which  valid  objective  measure- 
ments are  not  now  available  should  be  considered.  (3)  If  instructors 
are  to  rate  large  numbers  of  students,  the  number  of  items  should  not 
exceed  five.  (4)  Traits  should  be  mutually  exclusive.  (5)  No  single 
trait  should  include  unrelated  modes  of  behavior.  The  committee  under- 
took to  make  a  rating  scale  on  these  principles  for  use  in  a  cooperative 
experiment  among  selected  secondary  schools  and  colleges,  and  to  prepare 
instructions  for  the  guidance  of  raters  and  of  the  writers  of  specific 
case  records  or  character  sketches.  The  committee  emphasized  the  im- 
portance of  training  raters  if  valid  ratings  were  to  be  obtained.* 

Of  the  succeeding  three  years'  study  and  experimentation 
this  report  deals  only  with  that  portion  relating  to  the  Ameri- 
can Council  Personality  rating  scale  as  developed  by  Commit- 
tee III  to  date. 

In  spite  of  the  lack  of  any  very  definitive  work  on  this  kind 
of  scale  up  to  that  time,  a  preliminary  survey  of  the  extent 
and  nature  of  its  use  among  the  colleges  and  universities  of 
the  country  yielded  the  data  contained  in  the  following  sum- 
mary, based  on  a  questionnaire  and  tabulation  done  for  the 
Committee  in  July,  1927,  by  W.  E.  Parker  of  the  University 
of  Michigan. 

In  July  210  members  of  the  Americal  Council  on  Education  were  in- 
vited to  send  to  the  office  samples  of  the  record  forms  used  by  them. 
Although  many  colleges  were  closed  during  this  period,  78  institutions 
had  submitted  personal  record  forms  by  September  14,  1927.  .  .  . 

To  measure  the  personality  of  their  students  38  of  the  78  colleges  re- 
sort to  rating  scales,  listing  100  items.  This  relatively  large  number  of 
items  may  be  an  indication  of  the  importance  attached  by  the  colleges 
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to  the  measurement  of  character;  or  it  may  be  an  indication  of  the 
intangibility  of  the  material  sought.  The  chaotic  condition  revealed  by 
the  exhibit  of  rating  scales  would  suggest  that  the  latter  is  the  more 
likely  explanation.  The  38  institutions  seek  to  rate  118  traits.  There  is 
some  overlapping.  The  report  affords  640  statistical  items  for  study. 
The  smallest  number  of  traits  rated  by  any  coilege  is  5;  the  greatest  is 
57.  The  average  is  14.  It  will  be  remembered  that  the  Committee  on 
Personality  Measurement  suggested  that  raters  not  be  asked  to  rate 
more  than  5  in  case  the  number  to  be  rated  is  large.  The  lowest  range 
of  rating  is  3;  the  average  seems  to  be  5;  some  seem  to  be  of  indefinite 
range.  Ratings  are  sought  from  one  or  several  of  these  raters:  in- 
structors, principals,  friends,  business  references,  employers,  ministers, 
deans,  other  students,  the  student  himself  (self-rating).  The  number 
of  raters  varies  from  1  to  15.  There  seems  to  be  little  effort  to  secure 
ratings  periodically  on  the  same  person  whether  from  the  same  raters  or 
others.  Some  of  tho  traits  most  frequently  mentioned  are  these:  intel- 
ligence, 23  colleges;  leadership,  21;  initiative,  19;  companionability,  19; 
cooperation,  19;  personality,  17;  industry,  17;  reliability,  16;  persever- 
ance, 16;  integrity,  15;  alertness,  13;  personal  appearance,  13;  scholar- 
ship, 13;  originality,  11;  self-reliance,  10;  punctuality,  6.^ 

On  the  basis  of  the  preceding  data  and  the  general  principles 
formulated  at  the  first  meeting  the  Committee  constructed 
at  its  second  meeting  a  scale  of  seven  traits.  These  traits 
were  selected  on  the  following  grounds :  They  were  not  meas- 
ured better  by  existing  objective  tests;  they  were  most  uni- 
versally used ;  they  included  information  apparently  most  sig- 
nificant for  admission  officers,  graduate  and  professional 
school  deans,  and  industrial  personnel  officers;  they  were 
likely  to  be  observable  by  secondary  school  and  college  teach- 
ers and  other  raters;  they  had  stood  up  best  of  the  trait-rat- 
ings so  far  subjected  to  preliminary  experimentation  by  mis- 
cellaneous workers  in  this  field.  The  scale  as  then  constructed 
and  worded  was  mimeographed  for  distribution  by  the  coun- 
cil. The  following  is  a  copy  of  the  scale  which  in  this  study  will 
be  called  Scale  I : 


^  The  Educational  Record  (Washington,  D.  C.  October,  1927),  vol.  viii, 
No.  4,  pp.  316,  319,  and  320. 


IV.  First  Scale. 

PERSONALITY  MEASUREMENTS 

The   information  on  this  sheet   is  confidential. 

Name  of  Student 

Selection  and  guidance  of  students  are  based  on  scholastic  records  of 
achievement,  health  and  other  factual  records.  Personality,  difficult  to 
evaluate,  is  of  great  importance.  You  will  greatly  assist  in  the  education 
of  the  student  named  if  you  will  rate  him  with  respect  to  each  question  by 
placing  a  check  mark  on  the  appropriate  horizontal  line  at  any  point 
which  represents  your  estimate  of  the  candidate. 

If  you  have  had  no  opportunity  to  observe  the  student  with  respect  to 
a  given  characteristic,  please  place  a  check  mark  in  the  space  at  the 
extreme  right  of  the  line. 

No   oppor- 
tunity   to 
Arc  his  appearance  and  manner  an  asset  or  a  liability?  observe 


Avoided  Tolerated  Unnoticed  Well  liked 

by  others  by  others  by  others  by  others 

Does  he  get  his  work  done  on  time? 


Sought 
by  others 


Work  fre-  Work  usu-  Work  always 

quently  late  ally  on  time  done  promptly 

Does  he  need  constant  prodding  or  does  he  go  ahead  with  his  work 
without  being  told? 


Needs  much 
prodding  in 
doing  ordi- 
nary assign- 
ments 

Can  you  rely  on  him? 


Does  ordi- 
nary assign- 
ments of  his 
own  accord 


Completes 
suggested 
supplemen- 
tary work 


Seeks  and 
sets  for 
himself 
additional 
tasks 


Fulfils  obliga-  Usually  fulfils 

tions  only  as  obligations 

convenient; 
makes  excuses 

Does  he  get  others  to  do  what  he  wishes? 


Scrupulous  in 

fulfilling 

obligations 


Satisfied  to 
have  others 
take  the  lead 


On  occasions 
takes  the  lead 


Displays 
marked  abil- 
ity to  lead  his 
fellows;  makes 
things  go 


Does  he  have  a   "good  disposition?' 


"Born 
leader" 


Tends  to 
be  unhappy 

Is  his  disposition 

Too  easily 
moved  to  tears, 
anger  or  fits  of 
depression,  etc. 


Contented 


Generally 
cheerful 


a  help  or   a  hindrance? 

Occasionally 

over  emotional 


Well  balanced 


Dull;  unre- 
sponsive 


Occasionally 
unresponsive 


Enthusiastic 


Unusual  bal- 
ance of  re- 
sponsiveness 
and  control 


American  Council  on  Education:  Committee  on  Personnel  Methods; 

Trial  Rating  Scale,  January  1,  1928. 

28 


V.  Testing  the  Construction  of  the  Scale 
The  first  studies  made  were  concerned  with  the  inter-cor- 
relation between  traits  on  this  scale,  in  an  effort  to  include 
within  seven  ratings  as  many  significant  independent  variables 
as  possible. 

The  cooperation  of  a  fraternity  chapter  at  the  University  of 
North  Carolina  was  secured.  The  mimeographed  scale  was 
used  with  an  explanation  but  no  special  preliminary  training 
of  the  raters.  Thirty-one  ratings  were  secured  on  each  of  the 
thirty-three  members  of  the  fraternity.  This  was  done  in  the 
spring  after  the  members  had  been  in  close  association  for 
more  than  six  months.  Rank  order  lists  were  made  for  each 
trait  and  "rho"  obtained  for  inter-correlations  between  the 
traits.  Equivalent  values  of  "r"  were  substituted  from  Table 
XX,  p.  192,  Garrett,  Statistics  in  Psychology  and  Education 
(New  York,  1926).* 

TABLE  XL 

Inter-correlations  Between  Traits,  Scale  I. 

Thirty-one  Fraternity  Members  Rating  Thirty-three. 

Traits  r  P.E. 

1  vs2 
lvs3 
lvs4 
1  vs  5 

1  vs  6 
lvs7 
2vs3 
2vs4 

2  vs5 
2vs6 
2vs7 
3vs4 
3vs5 
3vs6 
3-vs7 
4  vs  5 
4vs6 

4  vs  7 
5vs6 

5  vs7 

6  vs7 

The  following  table  shows  the  inter-correlation  between 
traits  when  15  students  were  rated  by  5  of  their  most  recent 


,007 

.118 

,17 

.113 

,30 

.108 

,56 

.085 

,70 

.061 

.42 

.097 

,89 

.052 

,48 

.092 

,30 

.108 

,005 

.117 

,32 

.105 

,62 

.073 

.41 

.098 

.10 

.116 

.53 

.087 

.40 

.098 

.005 

.117 

.48 

.091 

.40 

.098 

.42 

.097 

.37 

.101 

*  Unless   otherwise  indicated  all  correlations  reported  in  this   paper 
have  been  obtained  by  first  obtaining  rho  through  the  formula,  rho  = 

1  —    _"  X  2d^     ^^^  then  transmuting  to  r  as  first  described.   P.  E.  is  ob- 

1  — r= 


N(N^  — 1)  ,         , 


tained  for  the  r  by  the  formula,  P.E.  =  .6745  X 


VN 
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instructors,  using  Scale  I.  The  probable  error  (P.  E.)  is  noted 
to  the  right. 

TABLE  III. 

Inter-correlations  Between  Traits,  Scale  I. 

Instructors  Rating  Students. 

Traits  r  P.  E. 

Ivs2 

1  vs3 

1  vs  4 

1  vs  5 

lvs6 

1  vs  7 
*2vs3 
*2  vs  4 

2vs5 

2vs6 
*2vs7 
*3  vs4 

3vs5 

3vs6 
*3vs7 
*4  vs  5 

4vs6 
*4vs7 
*5vs6 

5  vs  7 

6  vs7 

*  Greater  than  6  X  P.  E. 

In  general  this  table  substantiates  the  preceding  in  showing 
the  close  relationship  between  Traits  two,  three,  four,  five  and 
six.  On  the  basis  of  these  results  the  second,  fourth,  and  sixth 
traits  on  the  scale  were  dropped. 

Many  suggestions  were  received  from  raters  and  experi- 
menters as  to  improving  the  form.  It  was  discovered  that  rela- 
tively few  raters  used  the  line  between  points  under  which 
descriptive  terms  were  printed.  This  led  to  the  conversion 
of  all  traits  into  five-step  descriptions.  Wording  was  changed 
in  a  few  cases  to  clarify  meaning.  The  revised  scale  (to  be 
called  Scale  II)  is  given  on  Pages  30  and  31. 

Experiment  in  Scale  II  was  begun  with  the  following  ques- 
tions in  mind : 

1.  Do  any  of  these  traits  overlap  too  much? 

2.  Are  any  of  them  too  hard  to  rate  ? 

3.  Is  the  construction  of  the  scales  satisfactory  as  to  word- 
ing? 

4.  Is  the  arrangement  of  the  scale  effective? 


.11 

.179 

.16 

.175 

.03 

.200 

.14 

.176 

.10 

.087 

.28 

.16 

.80 

.071 

.78 

.066 

.50 

.128 

.30 

.157 

.68 

.091 

.79 

.064 

.37 

.149 

.22 

.164 

.80 

.07 

.697 

.084 

.27 

.17 

.66 

.095 

.67 

.083 

.33 

.144 

.33 

.144 
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Included  in  question  No.  4  were  two  new  inquiries.  Should  the 
optimum  point  of  the  scale  be  varied  from  trait  to  trait  or 
would  such  a  change  confuse  raters  and  scorers?  Would  an 
important  function  of  the  scale  in  use  be  to  stimulate  raters  to 
observe  and  record  significant  behavior  in  narrative  form  as 
well  as  to  evaluate  it  quantitatively  by  rating  it  on  the  scale? 
If  so,  what  distribution  of  space  on  the  scale  would  most 
facilitate  the  gathering  of  such  "behaviorgrams"?* 

1.  The  same  fraternity  as  had  worked  with  Scale  I,  after 
a  lapse  of  ten  months,  again  rated  its  members  using  Scale  XL 
Twenty-five  members  rated  each  of  the  thirty-seven  members. 
The  following  tables  gives  the  inter-correlations  in  terms  of  r. 

TABLE   IV. 
Inter-correlation  Between  Traits  on  Scale  II,  25  Ratings  on 

37  MENf 

Traits  r  P.  E. 

1  vs2 
*1  vs3 

1  vs  4 

1  vs  5 
*2vs3 
*2vs4 
*2vs5 

3  vs4 
*3  vs5 
*4  vs  5 

*  Greater  than  6  X  P-  E. 

t  The  r  was  obtained  by  using  Spearman's  Rank  Difference  Formula 
and  correcting  by  transmuting  the  rho  into  r. 

It  would  seem  likely  from  the  findings  that  either  the  sec- 
ond or  fifth  trait  should  be  discontinued  in  the  next  revision 
because  of  their  inter-correlation  coefficient  of  .98. 

2.  In  deciding  whether  to  drop  out  of  the  scale  trait  No.  2 
or  trait  No.  5,  other  considerations,  of  course,  come  in;  for 
example,  the  relative  difficulty  of  securing  ratings  on  the  two 
traits.  It  was  not  possible  to  check  this  point  from  the  fra- 
ternity ratings  just  described,  but  later  in  the  experiment,  a 
number  of  scales  were  mailed  out  to  the  references  furnished 
by  freshmen  entering  the  University  of  North  Carolina  and  to 
their  instructors  after  they  had  arrived  in  the  University  of 
North  Carolina.  Each  freshman  was  asked  at  his  registration 
time  to  submit  three  references.    Scales  were  mailed  to  these 


40 

.09 

,58 

.07 

,31 

.09 

,33 

.09 

,55 

.07 

.61 

.06 

,96 

.007 

,28 

.101 

,65 

.06 

.71 

.05 

*  For  explanation  of  term  "behaviorgram"  see  p.  50. 
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references  in  the  case  of  400  freshman,  making  1200  scales 
mailed  out.  Of  these,  804  were  returned.  Faculty  ratings  on 
some  of  these  freshmen  were  later  secured  to  the  total  of  145 
ratings.  The  table  below  shows  the  percentage  of  missing  rat- 
ings on  each  trait  for  faculty  and  for  references  out  in  the 
state  according  to  the  different  types  of  scales  used  as  well  as 
the  total. 

TABLE  V. 

The  Percentage  of  Missing  Ratings  by  Traits.    804  Secondary  and 

145  College  Ratings  on  Scales  II,  IV,  V  and  VI. 

Secondary  Ratings 


Order  of 

Trait 

II 

IV 

V 

VI 

Total 

Difficulty* 

I 

.8 

.8 

.7 

Trait  III 

II 

.8 

1.6 

1.4 

1.1 

Trait  V 

III 

3.2 

4.0 

5.5 

1.6 

4.1 

Trait  IV 

IV 

.2 

2.4 

2.2 

1.6 

1.3 

Trait  II 

V 

1.3 

7.2 

5.1 

6.4 

3.9 

Trait  I 

College  Ratings 

I 

30. 

25.9 

12.9 

25.4 

Trait  III 

II 

2.2 

5.2 

none 

3.4 

Trait  V 

III 

48.9 

75.3 

68.6 

66.2 

Trait  II 

IV 

20. 

38.9 

30.1 

used 

31.7 

Trait  IV 

V 

31. 

45.4 

38.7 

40. 

Trait  I 

*  This  column  represents  the  rank  of  the  traits  in  order  of  frequency 
of  missing  ratings,  or  in  order  of  difficulty  of  securing  ratings. 

There  is  obviously  a  greater  difficulty  in  securing  ratings  on 
trait  five  than  on  trait  two.  In  the  case  of  faculty  as  well  as 
secondary  school  ratings  the  difference  is  marked,  especially  in 
the  case  of  faculty  ratings.  The  other  noteworthy  features  of 
the  table  are  the  great  difference  in  the  frequency  of  missing 
ratings  as  between  faculty  and  secondary  school  ratings  and 
the  fact  that  with  the  exception  of  traits  two  and  four,  which 
get  reversed,  the  difficulty  of  securing  ratings  by  traits  is  the 
same  in  each  case. 

3.  The  question  as  to  wording  was  not  the  object  of  experi- 
mentation, but  was  settled  by  consensus  of  opinion  among 
users. 

4.  To  test  the  effect  of  varying  the  direction  of  the  optimum 
points  on  the  trait  scales  an  alternative  form  of  Scale  II  was 
published  (henceforth  called  Scale  III).  In  Scale  III  traits  one, 
three  and  four  were  printed  so  that  the  optimum  point  of  each 
scale  was  on  the  left  instead  of  the  right.  A  second  fraternity 
was  used  for  the  next  experiment  which  consisted  in  securing 


36     AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 


hJ 
< 

0) 


z 

< 

a: 

>■ 

h- 

Zj 

< 
z 
o 

en 

LiJ 


H 
Z 


C3 

•1-4 


> 

< 

Z 
U 

H 


o 
o 

03 


q: 

UJ 
Q. 
X 
UJ     V 

d:  ^ 
0  <" 
u. 

2 

q: 

O 
ii. 


5-<    C    rt    J.  (J 


a 
o 


CO 


o 

a 
o 


Z.      oj 


»4 

o 

c 


O 


:z; 


_  fco  2 

o  "-^  c 

rr3   c^   <^   ^ 


5^    2^    r^  -2    ^ 

C-     ?-■      C    ■4->     >•      ^ 


,  *-;2  3  a 


■^  ^-H   -»-  —      fi    > 

is  --^^ 

g^   Ox  3   ^   O 

o  C  ^  a  CvJ  ^  a; 

FQ  -*->  o  a>  s  ^  .5 

CJ    »  CO  y  .^TZ,    &> 

O)      C      OJ  S"      g      j^    rC 

0) 


««  >-<  ^  ^  o  X3X 
-O   5«^^-g5.bO 

o  •'^  oj  >j  ai  -t->  Q 

M  c3  _  *j  tij  u  a> 

oj  3:5  cfl  o  a>  ^, 
_Q  — '  *r"        ^  M  1^ 

oj  >  "^  C  Co-M 
Si    o   3  X   <^         .  . 


c;5 

CO 


9  —  -|J  O  "oj 


2  ^   oj  T"  -*->   ^   CO 
^'d  e  5:?  03  3 

O      -        -ij    O   S"  - 
^^    >,t^    C«^    g..S 

c3  g  3  o  «  c  ^ 

3  CO  "■'  5*  o)  TS  *-• 


<P 


OS 


O       Cj       K^ 

^  X  ^  o^x 
rt  M-  O  -^  O  >  y 

?r  ^  ??  o  03 

(-'-^  3  a»  -ft  -M  ""^  CO 
,C3  X  X  ;^       .^ 


o.t:  u 

O  C   M 
^^   O 


■a  u 


•o  « 

ax: 

0)  O 

2.  >> 


^   05 

c  o 


•O  0) 
:::  o 


OS 


=         c 
CO  x^  o  M 


2  a;  41  k- 

u  -  P  O 

—  «  c  J 

C  OJ  —  ^ 

p  bt  a  i, 

,°,  3  3  <3 

\J  vi  tn  ^ 


c."-a 

.:.  tax  u 
u  5.  o  <J 

4)  C'  c  c 
o  5  o  > 


o  a  (a 
o  C.5 

«  2  "5 
T3  '5;  -o 

4)    w    o 
«<   C3   u 

c  u  a 


■^'^  5  a  a 


u  —  u  en 


o 

«3 


3 


CO 
o 


•0 

c 

C8 

« 

V 

c 

c8 

(8 

e— 

Ol 

M 

p. 

a> 

0 

w 

*j 

^ 

0 

a> 

01 

fc 

0 

03 

0 

"O 

^ 

c 
c 

0 

C3 

K 

£ 

&0  m 
.S  X 

ft  ^  2 


^ 

•1# 

5)J) 

en 

^ 

C 

C 

es 

"^ 

0 

0 

0 

J3 

"O 

bo 

■i-> 

Ol 

3 

(U 

a 

0 

c 

J= 

J= 

AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE     37 


—  -  ^ 

u   c   ^  — 


>.2 


C    *^  — ' 


4,   J)  — 

C   >  <« 

tn   a   ^ 


♦.<"*■  i^  a) 
«)  01  O  '" 

E  -o  c  .a 

n  CO  -r  :-r- 


°  ^  - 


C^    *j     rti     ^ 


o  n 
♦^  u  c 
o)  4;  o 
■a  >Z 

coo 

0^   4,   g 


41 

> 

"a 

C   u 

2  = 


3  a 


Aw 
M  o 


CO 


X 


C  Z 
H  3 


'^se's 


09 


CO        ^ 
E    c    C3 


>,         0) 

-r  j!  CO  5  ai 
a  u  »>  o  c 
.2  CO -=.5 

QE2JiC: 


CO       c 


—  c 


c  — 

V-  41  O 

_  o  >  i: 

S    OJ    0)    C 

3  t'  c:  o 

0  c  o  w 
n  a. 


3-  f-o 

-  ffl  <v  c 
Dj3  u  a 


<"    «'    >    CO  S  5 

*rf  —  .^  «^  •^  CO 

o  &*•-  _  ■;:  »- 

4;  u  ;^  £  c:  ^ 

■ o 


ct 


;    41    1. 


01 

4)  i  •' 


in  ^-' 

7^  **  *■* 
-   M   4< 

<2.^ 


O    w 

<  i 


rt 

X. 

^ 

0 

-o 

0 

,*» 

w 

^ 

0 

J= 

-t-> 

0 

♦> 

a; 

«-• 

be 

0> 

a> 

a> 

^ 

^ 

■ji 

•1— 

m 

> 

0 

0 

Q 

o 

s 


O 

u 
■*» 

c 
o 

M 

o 


0 

<u 

e— 

J= 

bo 

2 

u 

c 

^ 

> 

T) 

•«-> 

<M 

c 

■^ 

0 

a 

s 

s 

a 

^0 

0 

_C 

CS 

w 

01 

0 

3 

p. 

'C 

01 

t^ 

■!-» 

CS 

s 

t/) 

K  o.'S 


(3 

0) 

t3 


(» 

M 

-4-» 

o 

O 
>» 

o 

o 
K 


m 

CO 

;-• 
n3 

< 


a 
o 


o 


■♦-J 

Q 


a 

c 
bo 

C/2 


2§g 

c«   O   c 

^W  ^ 
-^     ^5^ 

Ph  o  c 

3  OT 

o  c^ 

(D    '^     '^ 

£  £<^ 

£-< 
o 
O 


CO 


< 

O 

o 

'> 

o 


38     AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 


£  u  C  °    <u 

^  ft  «  t^    o, 

I  ^  I  1:2 

<1>  C!  '1  ft     ?> 

•2  '^  to  >H    'O        . 

■^3    C  «  «  "O          X 

<u  ^  >  >  <u    be 

g  g  I  -c  "S  ^-^ 

w  'a  S  -  !-  tM   4) 

O   +»  jfl  ^  C          -|J 

J    to  :J3  ►>>  o    he  ^ 

-  ^  42  g  g|  I 

3    bo  -^  03  ^  'O  T! 

I.s  ««  o  ^  s  > 

So  +r  -^  ^  c  ^ 

F^  w  rt  o  o)        — ■ 


-C    >. 


2       S  °  '^  »2    cs 

•t*        —     m  o  flj  ^     W 


_o 

'-t-> 
•^^  r^       >■  ■     .^  ^^  ^1 


^   cr        o        g       ^  -2  ^ 

C    ^  «H  -C  -2     ^    S 


o  ^  '-'  -p  £^    ««    « 

S  g      ^      I       g  S  § 

C  .y        0)        ':        g   e   « 


>,   ft         rt         ^  c  ^  _, 


cng'o.JS  5  ^oii 

•7<      TO      C      TO  Zi      ^      I- 

-^  ""  -^  ^  §   .  "s  »  ^ 

^  t^  .1   o  .  -II  S  -?   >. 


Cms 


AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE     39 

seventeen  ratings  on  each  of  its  eighteen  members,  each  rater 
using  both  forms  of  the  scale.  The  resulting  inter-correlations 
were  considerably  smaller  except  in  one  instance.  The  reliabil- 
ities were  slightly  smaller  as  a  rule  and  the  respective  advan- 
tages and  disadvantages  of  these  facts  and  especially  the 
greater  inconveniences  of  scoring  led  to  the  discontinuance  of 
Scale  III.    The  data  are  presented  in  the  following  tables. 

The  reliabilities  in  Table  VII  were  secured  by  correlating 
the  ratings  of  5  raters  against  5  on  Scale  II  and  4  against  4 
on  Scale  III.  Then  by  the  Spearman-Brown  Formula  the  pre- 
dicted reliabilities  of  10  vs.  10  were  secured  for  each  scale 
so  as  to  get  the  data  on  both  scales  in  comparable  terms. 

TABLE  VI. 
Inter-correlations  on  Scale  II  Compared  with  Scale  III,  10  Raters 
Using  Scale  II  and  8  Raters  Using  Scale  III  to  Rate  18  Fraternity 

Mates. 


Scale  II 

Scale  III 

Traits 

r 

P.E. 

C.  for  At. 

r 

P.E. 

C.  for  At. 

I  vs.  II 

.30 

.15 

.32 

.27 

.15 

.31 

I  vs.  Ill 

.68 

.08 

.73 

.64 

.08 

.66 

I  vs.  IV 

.66 

.08 

.83 

.07 

.23 

.07 

Ivs.V 

.60 

.09 

.65 

.10 

.20 

.11 

II  vs.  Ill 

.41 

.12 

.44 

.64 

.08 

.72 

II  vs.  IV 

.54 

.10 

.68 

.31 

.15 

.36 

II  vs.  V 

.83 

.04 

.94 

.64 

.08 

.75 

III  vs.  IV 

.76 

.06 

.94 

.05 

.24 

.05 

III  vs.  V 

.59 

.10 

.55 

.49 

.11 

.53 

IV  vs.  V 

.75 

.06 

.95 

.27 

.13 

.34 

TABLE  VII. 
Reliabilities  by  Traits  of  Scale  II  Compared  with  Scale  III. 


Scale  II 

Scale  III 

Trait 

10  vs.  10 

10  vs.  10 

I 

.93 

.92 

II 

.93 

.85 

III 

.93 

.96 

IV 

.68 

.88 

V 

.91 

.90 

Comp. 

.96 

.93 

All  P.E.  for  the  above  were  less  than  .06. 

A  copy  of  Scale  III  appears  on  Pages  36,  37,  and  38. 
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To  test  the  effect  of  alteration  of  the  scales  in  securing  be- 
haviorgrams,*  a  new  form  (herein  called  Scale  IV)  was 
printed,  based  on  the  suggestions  of  Dr.  D.  T.  Howard,  Direc- 
tor of  Personnel,  Northwestern  University. 

A  third  fraternity  at  the  University  of  North  Carolina  as- 
sisted in  this  experiment  by  getting  ten  members  to  rate  each 
of  its  seventeen  members,  five  of  the  ten  raters  using  Scale  II 
and  the  other  five  using  Scale  IV.  This  allowed  eighty-five 
possible  opportunities  for  obtaining  a  behaviorgram  on  a 
particular  trait  by  use  of  Scale  II  and  an  equal  number  by  the 
use  of  Scale  IV.  Out  of  the  eighty-five  possibilities  the  actual 
frequencies  of  such  behaviorgrams  for  each  type  of  trait  were 
as  indicated  in  Table  VIII. 

TABLE  VIII. 

Comparison  of  Frequencies  of  Behaviorgrams. 

Scales  II  and  IV. 


Trait 

Frequencies 
Scale  II 

Frequencies 
Scale  IV 

1 
2 
3 

4 
5 

3 
2 

1 
0 
1 

29 
34 
29 
21 
25 

Total 

7 

138 

As  a  result  of  this  experiment  Scale  II  was  discontinued  and 
a  new  revision  was  printed  May  9,  1929,  in  alternative  forms, 
revisions  "A"  and  "B,"  as  attached. 

The  question  as  to  the  correctness  of  wording  and  placing 
of  the  descriptive  terms  has  not  been  answered  except  indi- 
rectly through  comparison  of  the  reliabilities  of  Scale  I  and 
Scale  II.  This  comparison  was  made  difficult  by  the  fact  that 
only  4  traits  appeared  both  in  Scale  I  and  Scale  II.  It  might 
be  expected  also  that  a  shift  from  a  mimeographed  to  a  printed 
form  would  have  some  influence.  The  following  table  shows 
the  reliability  of  5  ratings  compared  with  5  ratings  for  Scale  I 
and  Scale  II  on  the  4  traits  which  appeared  on  both  scales. 


*  The  term  "behaviorgram,"  used  for  the  first  time  on  page  39,  is  used 
throughout  this  study  to  indicate  a  narration  of  instances,  supporting 
facts,  or  anecdotes  illustrative  of  the  behavior  of  the  person  being 
rated.  See  Appendix  C  for  examples  of  such  data  secured  by  this  scale. 
This  aspect  of  the  American  Council  scale  is  so  emphasized  as  to  make 
it  seem  advisable  to  find  a  single  word  to  use  for  conveying  the  idea. 
Definitely  constructing  a  scale  so  as  to  make  the  collection  of  behavior- 
grams a  function  of  the  scale  is  illustrated  by  the  attached  Scale  IV. 
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Thirty-three  men  were  rated  by  their  fraternity  mates  on 
Scale  I  and  a  year  later  37  members  of  the  same  fraternity 
were  rated  on  Scale  11.  Some  of  the  same  raters  were  involved 
in  both  groups,  but  in  each  case  they  were  selected  by  chance 
from  the  total  number  of  raters. 


TABLE  IX. 
Reliabilities  by  Traits  op  Scale  I  Compared  with  Scale  II. 


Trait 


Scale  I 
5  vs.  5 


Scale  II 
5  vs.  5 . 


I 

II 
III 
IV 


.59 
.46 
.43 

.28 


.07 
.09 
.07 
.10 


.88 
.87 
.87 
.52 


.03 
.03 
.03 
.04 


Theoretically  a  rating  on  the  5th  step  on  any  one  trait 
should  be  equivalent  to  a  rating  on  the  5th  step  on  any  other 
trait.  The  fact  that  this  is  not  true  in  actual  practice  is  shown 
by  comparing  the  frequency  with  which  a  rating  of  any  partic- 
ular value  will  occur  in  each  of  the  traits.  The  relative  values 
are  made  still  less  comparable  when  as  in  Scale  I  the  different 
trait  scales  have  a  varying  number  of  steps  identified  by  a 
brief  description.     Chart  I  following  shov/s  the  frequency  of 
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each  step  on  the  7  traits  of  Scale  I  when  fraternity  mates  made 
a  total  of  1023  ratings  on  each  trait.  Because  there  are  5-,  4-, 
and  3-step  traits  on  this  scale  it  was  necessary  to  indicate  that 
fact  along  the  bottom  line  of  the  chart.  The  curves  were  so 
constructed  as  to  make  the  midpoints  of  the  scales  coincide. 
Chart  II  shows  the  much  more  regular  distribution  of  ratings 
when  as  in  Scale  II  all  traits  had  been  put  on  a  9-step  basis 
with  5  descriptive  terms. 

Chart  III  following  shows  the  sorts  of  distribution  curves 
which  were  obtained  when  ratings  by  freshmen  instructors 
and  raters  who  knew  the  freshmen  as  high  school  students 
were  combined.  (In  securing  these  ratings  various  forms  of 
the  scale  were  used,  because  small  quantities  of  each  form 
were  on  hand.  None  of  Scale  I  and  Scale  III  were  used^  how- 
ever.) 

The  changes  in  the  Leadership  Trait  (V  on  Scale  I  and  III 
on  Scale  II)  consisted  in  changing  from  a  three-step  to  a  five- 
step  scale  and  in  changing  descriptive  terms.  The  result  is 
that  the  same  raters  rating  the  same  fraternity  mates  after  a 
ten-months'  interval  yield  a  much  more  normal  distribution. 
Improved  distributions  appear  in  nearly  all  traits  in  Chart  II 
compared  with  Chart  I.  The  right-hand  skewedness  of  Trait 
I  on  Charts  I  and  II  might  be  attributed  to  the  selective  nature 
of  fraternity  membership  insofar  as  personal  appearance  is 
concerned.  The  right-hand  skewedness  of  all  traits  in  Chart 
III  might  be  interpreted  as  due  to  rating  college  freshmen 
against  the  senior  secondary  school  background  or  to  the  effort 
of  raters  to  be  helpful  to  the  freshmen.  There  is  still  room  for 
improvement  in  constructing  the  scales  so  as  to  make  the  iden- 
tical steps  on  the  various  traits  equivalent  in  frequency  and 
value. 


VI.  Testing  the  Scale's  Reliability 

As  indicated  in  the  discussions  of  the  first  meeting,  the  com- 
mittee felt  so  sceptical  of  the  reliability  of  ratings  as  to  regard 
their  proposed  experimentation  as  probably  a  decent  burial  of 
the  rating  scale,  a  final  and  thorough  proof  of  its  unreliability 
under  all  normal  conditions.  It  is  true  that  rating  scale  tech- 
nique is  constantly  used  to  validate  objective  tests.  Yet  the 
rating  technique  does  not  furnish  a  specimen  of  individual  re- 
sponse to  a  standardized  situation,  as  the  test  technique  does 
in  theory  at  least,  but  rather  a  record  in  language  of  what  the 
rater  thinks  the  subject  usually  does  or  would  do  in  response 
to  certain  classes  of  situations.  Even  if  the  scale  calls  on  the 
rater  to  report  on  observed  tendencies  to  behavior  patterns 
rather  than  to  infer  the  subject's  possession  of  certain  myth- 
ical traits  or  faculties,  yet  there  still  exists  possibility  that  the 
behavior  observed  is  unrepresentative  (i.e.,  "impulsive"  and 
accidental  rather  than  "habitual")  or  that  the  situation  was 
insignificant  or  wrongly  classified.  If  we  would  test  reliability 
of  ratings  by  having  the  same  raters  repeat  ratings  after  in- 
tervals of  time,  the  additional  opportunities  to  observe  and 
real  changes  in  the  subject's  character  might  act  to  decrease 
reliability  coefficients.  If  we  would  test  reliability  of  ratings 
by  the  agreement  of  different  raters  making  simultaneous  rat- 
ings, the  raters'  varying  degrees  of  acquaintance  with  the  sub- 
ject, their  differing  terms  of  observation,  and  their  differing 
experiences  by  which  they  interpret  the  terminology  of  the 
scale  all  tend  to  decrease  the  reliability  coefficients.  Only  one 
thing  can  safely  be  said  of  a  rating:  that  is,  that  it  records 
what  the  rater  thinks  about  the  subject,  and  even  here  there 
is  a  possible  language  error  factor  between  rater  and  the  user 
of  the  rating.  In  view  of  all  these  error  factors  the  reliability 
coefficients  obtained  in  this  study  are  striking. 

Interpretation  of  the  reliabilities  indicated  here  should  in- 
clude recognition  of  two  advantages  present  to  an  unusual  de- 
gree in  these  experiments.  First,  the  scale  was  most  carefully 
constructed  as  indicated  in  the  specifications  adopted  from  the 
beginning.  Possibly  most  important  of  all  in  this  respect  was 
the  shift  from  the  trait  nouns  to  behavior  verbs.  Second,  is 
the  large  number  of  ratings  secured  on  one  person  and  the  un- 
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usual  degree  of  intimacy  and  the  possession  of  a  common 
standard  of  comparison  to  be  found  in  a  college  fraternity 
chapter. 

The  reliabilities  found  under  various  conditions  of  trial  are 
summarized  in  the  table  on  page  55. 

In  the  main,  these  figures  substantiate  the  results  obtained 
in  previous  investigations.  They  are  consistently  high  enough 
in  most  instances  to  justify  the  careful  use  of  such  scales.  The 
trait  most  reliably  rated  is  Trait  II  v^hich  is  probably  the  most 
objectively  defined  and  possibly  the  most  highly  generalized  of 
all  the  traits.  Appearance  and  manner  is  rated  least  reliably, 
possibly  because  it  calls  for  the  grader's  subjective  reaction. 
Emotional  stability  has  a  low  reliability,  possibly  because  one's 
behavior  in  this  respect  is  so  little  generalized  and  so  rarely 
exhibited,  the  situations  w^hich  call  it  forth  being  relatively 
rare  and  varied.  Column  2  shows  actual  reliability  obtained 
by  matching  12  vs.  12  raters  and  column  5  the  reliabilities  pre- 
dicted by  the  Spearman-Brown  Formula  when  3  raters  from 
the  group  of  12  were  matched  against  each  other  and  predic- 
tion based  on  the  average  result.  Probably  the  agreement  be- 
tween columns  2  and  5  would  have  been  greater  if  all  the  rat- 
ers involved  in  II  had  been  used  in  5  and  the  average  of 
their  reliabilities  used  as  basis  for  prediction.  This  refine- 
ment of  procedure  seemed  unnecessary  in  view  of  Rem- 
mer's  (70)  previous  careful  work  with  the  Purdue  rating  scale 
to  check  up  this  particular  point.  Reliabilities  above  90  are 
not  at  all  common  in  the  literature  on  rating  scales,  their  fre- 
quency in  the  above  tables  probably  being  due  to  the  large 
number  of  raters  used  and  the  adequacy  of  their  acquaintance- 
ship with  the  subject  being  rated.  Watson's  review  lists  the 
following  coefficients  of  reliability  as  reported  by  different 
workers  in  the  field,  indicating  that  these  are  the  maximum  ob- 
tained but  not  describing  the  circumstances  under  which  they 
were  obtained:  Barr  .80,  Freyd  .87,  Webb  .81,  Knight  and 
Cleeton.  90,  Shen  .91,  Furfey  .94,  Porteus  .97.  Kornhauser  re- 
ports the  reliabilities  as  high  as  ,82  for  a  single  rating.  This 
accumulation  of  evidence  indicates  pretty  clearly  that  rating 
can  be  as  reliable  as  the  usual  test. 

Adams  (1)  has  recently  suggested  that  an  objectivity  ratio 
may  be  obtained  by  comparing  the  reliability  of  a  rating  re- 
peated by  the  same  rater  after  an  appropriate  interval  of  time 
with  ratings  made  simultaneously  by  different  raters.  The  only 
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test  of  this  made  in  this  experiment  is  somewhat  defective  be- 
cause it  is  involved  in  changing  from  Scale  I  to  Scale  II,  the 
reliability  of  the  latter  being  greater.  However,  two  raters 
were  compared  with  themselves  on  Scale  I  and  Scale  II  after 
the  interval  of  10  months.  Then  two  other  raters  were  sim- 
ilarly compared.  Then  the  four  raters  were  compared  two 
with  two  on  Scale  I  and  then  on  Scale  II.  The  mean  reliability 
of  the  raters  as  compared  with  their  own  rating  was  .535.  The 
mean  reliability  of  the  four  raters  compared  with  each  other 
was  .413.  This  is  concerned  only  with  the  four  traits  which 
appear  both  on  Scale  I  and  Scale  11.  The  objectivity  ratio 
would  be  .77  on  this  evidence.  (Adams  reports  ratios  as  high 
as  .926  on  ratings  on  Industry.  He  reports  average  ratios  of 
objective  tests  as  .9925.) 

It  will  be  interesting  to  know  what  is  the  normal  reliability 
of  such  a  scale  under  usual  circumstances  of  use  and  what  are 
the  sources  of  unreliability.  Table  XI  below  indicates  some 
answer  to  these  questions.  The  reliability  of  three  ratings  by 
secondary  school  references  on  107  freshmen  as  predicted  by 
the  Spearman-Brown  Formula  from  the  average  reliability  of 
three  ratings  checked  against  each  other  is  conspicuously 
lower  than  the  figures  in  Table  X.  Only  in  Trait  II  do  we  get  a 
coefficient  higher  than  .70.  One  hundred  behaviorgrams  were 
selected  from  the  scales  sent  in  on  this  group  of  freshmen  and 
these  behaviorgrams  were  rated  by  six  raters;  the  actual  re- 
liabilities of  3  vs.  3  raters  are  shown  in  the  second  column  of 
Table  XL  These  would  indicate  that  the  unreliability  of  the 
ratings  in  question  was  caused  primarily  by  a  varied  knowl- 
edge of  the  individuals  concerned.  Raters  rating  the  behavior- 
grams had  just  as  good  chance  to  have  diverse  standards  of 
rating  as  did  the  three  raters  actually  rating  the  students  con- 
cerned. It  would  seem  that  the  greatest  source  of  unreliability 
in  rating  scales  lies  in  the  varied  opportunities  for  observa- 
tion [and  acquaintance  with  different  kinds  of  behavior  on  the 
part  of  the  person  rated]  rather  than  in  the  scale  itself  or  in 
the  processes  of  judgment.  The  third  column  of  Table  XI 
shows  the  reliabilities  of  three  faculty  ratings  on  31  of  these 
same  freshmen,  the  relatively  higher  reliabilities  of  Traits  III 
and  IV  being  due  to  the  fact  that  so  many  ratings  omitted  be- 
cause of  no  opportunity  to  observe  were  counted  as  average  in 
computing  the  rank  list.  (This  procedure  has  been  recom- 
mended by  Strong  on  the  ground  that  an  omitted  rating  usu- 
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ally  means  no  outstanding  degree  of  the  trait  in  either  direc- 
tion. In  this  case,  however,  it  would  appear  to  be  due  to  lack 
of  opportunity  to  observe  the  trait  in  question  and  the  correla- 
tions were  raised  accordingly.) 

TABLE   XL 

Reliabilities  of  Various  Ratings. 


Trait 

107    (1) 
Fresh- 
men 
3  vs.  3 

100      (2) 
Behavior- 
grams 
3  vs.  3 

31       (3) 
Fresh- 
rifien 
3  vs.  3 

37     (U) 

Fraternity 

men 

3  vs.  3 

J^6     (5) 
Behavior- 
grams 
Col. 
Annual 
3  vs.  3 

I 

II 
III 
IV 
V 
Composite 

.349 
.725 
.635 
.561 
.665 

.88  (.041) 
.82  (.047) 
.90  (.02) 
,91  (.02) 
.93  (.01) 

.46 
.46 
.85 
.71 
.69 

.77 

.74  (.04) 
.81  (.03) 
.78  (.03) 
.58  (.06) 
.26  (.08) 
.76  (.04) 

Column  1.  3  vs.  3  secondary  school  ratings  on  freshmen  as  calculated 
by  Spearman-Brown  Formula  from  Product-Moment  coefficients. 

Column  2.  3  vs.  3  ratings  on  100  behaviorgrams  representing  all 
points  on  all  traits  except  point  1  on  Trait  I  for  which  no  behaviorgram 
was  furnished. 

Column  3.  Three  faculty  ratings  used  to  predict  the  reliability  of  3 
by  the  Spearman-Brown  Formula. 

Column  4.  3  vs.  3  ratings  on  37  fraternity  men  with  the  ratings  on 
Traits  I,  III,  IV,  and  V  transposed  into  equivalent  values  of  II  so  as  to 
give  a  weighted  average  according  to  values  indicated  by  Chart  3. 

Column  5.  3  vs.  3  ratings  on  behaviorgrams  from  a  college  annual  of 
1916. 

Since  the  reliabilities  in  column  2  of  the  preceding  table  are 
between  3  vs.  3  raters  and  are  higher  in  the  main  than  the  re- 
liabilities of  5  vs.  5  raters  rating  actual  people  it  would  appear 
that  a  rating  scale  used  to  rate  actual  material  of  a  factual  sort 
[or  even  an  opinion]  will  yield  more  reliable  results  than  a 
scale  used  to  rate  persons.  This  suggests  that  it  might  be  well 
to  obtain  data  from  various  sources  on  students  and  then  have 
such  data  rated  by  members  of  the  personnel  office  and  the 
ratings  obtained  in  this  way  recorded  on  the  personnel  record. 

Another  feature  of  the  preceding  table  is  the  rather  peculiar 
fact  that  the  reliability  of  the  composite  rating  is  decreased  by 
weighting  the  rating  so  as  to  make  each  point  on  the  various 
scales  equivalent  to  the  proper  values  on  other  scales.  In 
Chart  2  it  will  be  noted  that  the  distribution  curve  of  ratings 
on  Trait  II  resembles  the  normal  distribution  curve.  Accord- 
ingly percentile  values  were  plotted  from  this  curve  to  the 
others  and  the  rating  on  each  scale  was  given  the  value  that  it 
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should  have  had  according  to  its  relationship  to  the  distribu- 
tions of  Scale  11.  This  device  would  seem  calculated  to  de- 
crease the  errors  involved  in  a  composite  rating  through  the 
incomparabilities  of  the  different  scales.  The  resulting  re- 
liability, however,  was  shown  in  the  table  as  .77,  whereas  the 
reliability  of  the  composite  rating  obtained  without  weighting 
in  5  vs.  5  cases  was  .93  and  in  one  tabulation  of  5  vs.  10  was 
.86.  The  same  decrease  occurred  in  validity  as  reported  in  the 
following  section  on  validity. 

The  reliabilities  of  3  ratings  vs.  3  on  the  46  behaviorgrams 
taken  from  an  old  college  annual  are  intermediate  in  position 
between  the  reliabilities  of  three  ratings  on  freshmen  and 
those  on  the  behaviorgrams  concerning  those  freshmen.  The 
material  in  the  46  behaviorgrams  was  the  usual  college  annual 
material  including  such  items  as  age,  height,  weight,  statistics 
of  membership  in  college  organizations  of  that  period  followed 
by  a  brief  characterization  of  each  individual  by  some  friend. 
(See  Appendix  D.) 

The  principal  defect  in  these  behaviorgrams  as  compared 
with  the  100  was  that  in  some  cases  they  did  not  bear  directly 
on  the  traits  which  were  required  by  this  scale  whereas  the  100 
behaviorgrams  were  all  taken  from  the  traits  of  the  scale  and 
each  was  rated  only  on  that  trait  which  it  was  presented  to 
illustrate.  Instructions  to  the  6  raters  did  not  emphasize  suffi- 
ciently the  undesirability  of  their  giving  a  rating  on  a  trait 
where  they  felt  the  evidence  insufficient  for  that  purpose.  Ac- 
cordingly several  confessed  at  the  end  of  the  experiment  that 
they  had  in  some  cases  assigned  arbitrary  values.  This  in  part 
may  have  been  responsible  for  the  relatively  lower  reliabil- 
ities. In  spite  of  this,  however,  the  reliabilities  indicate  once 
more  that  where  the  same  data  are  furnished  the  raters'  re- 
liabilities are  apt  to  be  raised  considerably. 

In  general  it  would  appear  accurate  to  say  that  the  reliabil- 
ities reviewed  in  this  section  indicate  that  the  rating  scale 
technique  is  of  sufficient  value  to  continue  its  use  provided 
certain  precautions  are  taken  to  insure  adequate  knowledge  on 
the  rater's  part  of  the  data  to  be  rated.  Particularly  is  this 
true  if  the  scale  is  carefully  constructed. 

Solving  the  Spearman-Brown  Formula  for  N,  calculations 
were  made  as  to  how  many  ratings  would  be  needed  to  secure 
reliabilities  of  75,  80,  85,  90,  and  95  for  the  various  traits. 
The  following  table  indicates  these  results.    It  is  interesting  to 
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know  that  only  in  the  case  of  Trait  II  could  a  reliability  of  .90 
be  secured  with  as  few  as  10  raters,  whereas  in  Table  X  many 
reliabilities  of  above  .90  were  found  with  fewer  raters.  Once 
again  the  inference  would  seem  logical  that  the  low  reliability 
of  ratings  is  due  to  the  lack  of  common  data  instead  of  defects 
in  the  processes  of  judgment  or  in  the  technique  of  ratings. 

TABLE  XII. 
Number  of  Ratings  Required  for  Common  Reliabilities. 


Trait 

r  =  .75 

.80 

.85 

.90 

.95 

1 

No.  raters  =17 

22 

32 

50 

106 

2 

No.  raters  =  3 

5 

7 

10 

21 

3 

No.  raters  =  5 

7 

10 

16 

33 

4 

No.  raters  =  7 

9 

13 

21 

44 

5 

No.  raters  =  5 

6 

9 

14 

29 

VII.  Testing  the  Scale's  Validity 
Getting  a  real  test  of  the  validity  of  this  scale  has  been  the 
most  difficult  part  of  the  task.  In  a  certain  sense  reliability  is 
also  a  measure  of  validity  in  that  it  shows  the  reliability  of 
the  scale  in  predicting  how  raters  will  judge  the  subject.  In 
other  words,  a  rating  scale  is  an  objective  test  of  acquaint- 
ances' reactions  to  the  persons  rated,  to  the  extent  that  it  is 
reliable.  In  that  sense,  the  previous  section  may  be  said  to 
have  shown  that  this  scale  has  a  considerable  degree  of  valid- 
ity, particularly  when  used  by  fraternity  mates  on  each  other. 
Since  one  great  value  of  a  conduct  rating  scale  would  be  its 
determination  of  abilities  not  now  measured  by  mental  alert- 
ness tests  nor  by  scholarship  grades,  an  effort  has  been  made 
to  find  some  criterion  of  general  adjustment  capacity  or  per- 
sonal effectiveness  not  immediately  involved  in  the  two  previ- 
ous measures.  It  was  possible  to  get  psychological  test  scores, 
college  grades,  and  some  additional  data  on  22  of  the  37  fra- 
ternity men  used  in  the  ratings  on  Scale  II.  Among  the  addi- 
tional data  was  a  complete  list,  obtained  by  an  interview  with 
each  man,  of  his  participations  in  the  organized  activities  of 
the  campus.  This  included  all  the  organizations  of  which  he 
was  a  member,  the  year  in  which  the  membership  occurred, 
and  any  offices  or  special  positions  held.  All  of  these  activities 
were  listed  and  were  ranked  in  regard  to  their  leadership 
significance  by  three  judges  independently;  then  the  average 
of  these  ranks  was  used  as  a  point  scale  measure  of  the  leader- 
ship significance  of  each  activity  participation.  This  afforded 
a  fairly  objective  criterion  of  the  general  nonacadamic  effec- 
tiveness of  each  one  of  the  22  students  in  question.  Several 
correlations  were  made  with  this  material. 

Correlations  between  the  five  traits  and  this  leadership  score 
and  between  the  composite  rating  on  the  five  traits  and  the 
leadership  score  were  as  follows : 

TABLE  XIII. 
Correlations  BETfWEEN  Trait  Ratings  and  Leadership  Score. 


Trait  I  vs.  Leadership  .67  (.07) 

Trait  II  vs.  Leadership  .61  (.08) 

Trait  III  vs.  Leadership  .69  (.08) 

Trait  IV  vs.  Leadership  .36  (.09) 

Trait  V  vs.  Leadership  -58  (.08) 

Composite  Ratings  vs.  Leadership  .66  (.07) 

59 
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None  of  the  above  correlation  coefficients  are  corrected  for 
attenuation.  Consistent  relationship  is  shown  throughout  ex- 
cept in  the  fourth  trait,  emotional  stability.  The  highest  cor- 
relation is  in  the  third  trait  which  refers  directly  to  leader- 
ship. The  composite  rating  is  not  significantly  higher  than  the 
others  and  is  lower  than  the  rating  on  personal  appearance 
and  manner  and  on  leadership. 

In  this  same  group  of  22  men  the  correlation  between  lead- 
ership score  and  psychological  test  score  (the  American  Coun- 
cil test  being  used)  was  plus  .65  (.07). 

The  relationship  between  grades  and  leadership  achievement 
or  activities  participation  is  shown  by  the  fact  that  the  corre- 
lation between  grades  and  test  scores  in  the  group  of  22  was 
.70  (.07),  while  the  correlation  between  grades  and  leadership 
score  was  .45  (.08) .  In  general  it  would  seem  safe  to  say  that 
the  scale  measures  personal  qualities  directly  related  to  the 
achievement  of  campus  leadership  and  less  directly  related  to 
the  achievement  of  campus  scholarship.  The  test  score  meas- 
ures qualities  related  directly  to  the  achievement  of  scholar- 
ship and  less  directly  qualities  related  to  the  achievement  of 
campus  leadership. 

Another  approach  to  this  same  question  of  validity  in  gen- 
eral development  was  made  through  trying  to  discover  if 
the  academic  age  of  a  student  made  any  difference  in  the  aver- 
age rating.  No  reliable  difference  in  ratings  could  be  discov- 
ered. Freshmen  rated  slightly  lower  than  sophomores  and 
seniors,  sophomores  slightly  lower  than  seniors,  juniors 
slightly  lower  than  both  sophomores  and  seniors,  seniors  of 
course  rating  higher  than  any  other  class  group.  These  differ- 
ences, however,  were  too  small  to  justify  quoting  and  were 
statistically  unreliable. 

The  next  major  work  on  validity  was  done  with  scholarship 
in  the  University.  The  table  below  summarizes  the  various 
correlations  calculated  on  this  point  among  the  107  freshmen 
on  each  of  whom  three  ratings  were  obtained  from  the  sec- 
ondary school  raters  and  among  37  Honor  Roll  and  61  Proba- 
tion List  freshmen,  some  of  whom  are  also  in  the  list  of  107. 

The  low  coefficients  in  the  case  of  the  honor  roll  and  proba- 
tion freshmen  might  logically  be  due  to  the  asserted  tendency 
to  distort  toward  the  average  in  rating  extreme  subjects.  The 
lower  correlations  of  the  probation  list  might  be  due  to  the 
greater  tendency  of  men  to  do  less  than  their  best  as  compared 


AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE     61 


<v 
P. 

U3 

U3 

'•«j 

C3 

s 

o 

CO   p   o 

■^ 

»H 

a) 

^ 

«W 

S^^ 

H 

(I 

a 

CO    5J    C 

in>  CO  mi  CO  00  eg 

-u 

c 

y-l  O  O  O  O  O 

c 

3 

4-* 

05^2 

o 

o 

c 

pq 

^ 
^ 

'^ci       1 

03 

CJ 

•  1^ 

o 

o 

a 

?>  «-^ 

t-lOO(M00<M 

0) 

<u 

M 

CO  «  e 

tH  OCMiHOO 

■■'IN 

s 

< 

be 

s 

u 

i 

'T3 

c 

CS 

bo 

in 

u 

[a 
O 

-^co-^ 

C3 

> 

o 

0) 

:i^ 

be 

»— « 

o 

52.  w    • 

•w    CO  c^ 

•      ••••• 

2 

'o 

s 

u 
o 

^ 

e  ^"i)  So 

-tJ 

^ 

w 

o 

G5ts 

-tJ 

o 

13 

» 

m 

o 

?H 

> 

■5  K     . 

coiMooinioo 

CO 

3 
m 

bJO 

o 

bfl 

O  (N  OC<l  tH  (N 
•       ••••• 

>■ 

.s 

.s 

M   1 
Eh  ^ 

m 

C 

■  l-H 

S 

'e3 

OS 

(4 

c 

o 

^ 

(U 

a 

b 

^ 

,_ 

a 

9 

<J^  CO   » 

?»    g;    CO 

^-,^"^^.-v.^~.,— , 

O 
O 

J3 

ci  !v2  CO  05  CO  <£> 
o  g  o  q  q  o 

^ 

U2 

cu 

CO  O  CO 

o 

10 

Ul 

t-„<ocococo 
o22.HO(Mi-i 

•  CV        •        •        •        • 

>> 

C 
O 

t- 

1-1 

CO 

(x. 

CO 
o 

o 

o 

§^ 

0) 

2-^ 

>* 
H 

^    00* 

______ 

CO 

> 

o 

n 

t— i 

CO  4^  CO 

s  go 

CO  "^  O  CO  lO  CO 

VH 

cS 

a 

o  q  i-H  q  q  q 

o 

b3 

O    M 

o  a,  w 

'"a  ^ 

> 

'■^'^J:; 
^     '^ 

00  OCO  kO  -^  o 

be 

T-H  lO  O  C<J  CO  CO 

t 

s 

CM 

0) 

CtJ      >-i 

CD 

> 

1^ 

c   •  o 

0) 

oi 

oco  o 
cj      X 

CO 

'i 

73 

C 

g    g    W    C    M 

•♦-^  c«co  t«  >> 

^ 

i 

CO 

S 

§1 

g^   C3 

^ 

t,2 

^5  S 

^  O 

16- 

62     AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 

with  the  tendency  to  excell  one's  best.  All  the  coefficients  in 
columns  3,  4,  5,  and  6  are  less  than  6  X  P.  E.  Possibly  the 
lower  correlations  are  in  some  measure  due  to  the  fact  that 
there  were  only  one  or  two  ratings  on  some  of  the  probation 
and  honor  roll  cases. 

In  looking  over  the  preceding  table  it  might  be  borne  in 
mind  that  in  correlating  anything  with  grades  the  unreliabil- 
ity of  grades  must  be  taken  into  consideration.  Cowdery  has 
suggested  that  with  the  present  unreliability  of  grades  a 
correlation  of  +.70  is  about  as  high  as  can  be  expected  be- 
tween any  variable  and  college  grades.  With  this  in  mind  it 
is  interesting  to  note  that  the  composite  rating  on  the  37  fra- 
ternity men  by  25  of  their  fraternity  mates  gives  a  correlation 
with  the  grades  of  those  37  men  of  a  +.68  (.05)  (not  included 
in  Table  XIV).  In  previous  studies  made  at  the  University 
of  North  Carolina  it  has  been  found  that  scores  on  the  Ameri- 
can Council  Test  and  average  high  school  grades  have  about 
equal  predictive  value  for  first  semester  freshmen  grades  in 
this  institution.  The  test  score  versus  fall  semester  grades  of 
the  107  freshmen  gives  a  correlation  of  .50  (.04).  The  rat- 
ings by  three  raters  on  Trait  II  on  the  same  107  freshmen 
yield  exactly  the  same  coefficient,  namely  .50  (.04),  the  com- 
posite rating  on  all  the  traits  correlated  with  grades  .30  (.06). 
In  other  words,  insofar  as  these  107  freshmen  are  representa- 
tive, the  average  of  three  secondary  school  ratings  has  as  high 
predictive  value  for  first  term  scholarship  as  does  the  test 
score  on  the  American  Council  Test  in  this  institution. 

In  order  to  see  what  could  be  done  by  the  multiple  correla- 
tion technique  to  secure  a  higher  predictive  value  multiple  cor- 
relations were  made  as  indicated  in  the  table  below. 

TABLE  XV.* 

Multiple  Correlations  on  Validity. 

107  Freshmen  Rated  by  3  Secondary  School  References. 


Grades  vs.  Tests  and  Trait  II 

.63 

Grades  vs.  Tests  and  Traits  I  and  IV 

.60 

Grades  vs.  Tests  and  Traits  IV  and  V 

.54 

Grades  vs.  Tests  and  Traits  II  and  IV 

.53 

*  1.  Partial  Correlation  Formula  No.  49 — Garrett. 
Multiple  Correlation  Formula  No.  56 — Garrett. 

The  most  noteworthy  feature  of  the  preceding  table  is  the 
multiple  correlation  of  +.63  obtained  between  grades  on  the 
one  hand  and  test  scores  and  ratings  on  Trait  II  on  the  other. 


AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE     63 

If  Cowdery's  estimate  be  correct  this  is  nearly  as  high  a  de- 
gree of  relationship  as  could  be  secured  for  this  group  of 
students.  It  is  apparent  that  Trait  II  records  some  factor 
related  to  scholarship  achievement  not  entirely  duplicated  by 
the  American  Council  Test.  The  relative  significance  of  the 
kind  of  acquaintanceship  between  the  rater  and  those  rated  is 
indicated  by  the  fact  that  Trait  II  correlates  with  grades 
-f.l5  (.09)  when  three  faculty  members  rate  31  freshmen, 
+.50  when  three  secondary  school  references  rate  107  fresh- 
men, and  +.68  in  a  composite  of  rating  on  all  traits  when  25 
fraternity  mates  rate  37. 

The  preceding  discussion  is,  of  course,  by  no  means  an  ex- 
haustive discussion  of  the  possible  validity  of  the  rating  scale, 
but  inasmuch  as  this  particular  scale  was  constructed  with  a 
definite  idea  in  mind  of  finding  one  which  would  be  of  value 
in  the  selection  and  guidance  of  college  students  it  would  be 
expected  to  be  valid  primarily,  insofar  as  any  available  objec- 
tive criteria  were  concerned,  mainly  in  the  directions  of  schol- 
arship and  campus  life  adjustments. 

One  of  the  diflficulties  of  getting  satisfactory  validities  in 
rating  scale  work  is  probably  due  to  the  great  complexity  of 
the  criterion  with  which  the  scale  is  to  be  checked.  There  are 
so  many  variations  of  intellectual  development  and  of  things 
like  leadership  and  scholarship,  and  so  many  complex  factors 
enter  in  to  produce  the  result,  that  high  validity  will  probably 
not  be  secured  until  after  more  work  has  been  done  in  analyz- 
ing out  the  trait  involved  and  in  standardizing  the  measures 
of  the  criterion  itself. 

Our  results  in  this  table  are  not  out  of  line  with  work  else- 
where. Manson  reports  (unpublished)  from  Michigan  the 
correlation  of  .4333  between  a  secondary  school  principal's 
rating  on  intellectual  performance  and  first  semester  grades 
in  1072  cases;  and  Hartsorn  of  Oberlin  reports  (unpub- 
lished) a  correlation  of  .568  between  the  ratings  of  school 
principals  on  four  items  and  grades  at  that  institution  in  the 
case  of  159  men  students. 

In  addition  to  success  in  scholarship  and  campus  activities 
while  in  college,  it  would  be  interesting  to  know  whether  a 
rating  scale  technique  could  predict,  with  any  degree  of  cer- 
tainty, success  after  graduation.  It  is  extremely  difficult  to 
find  any  method  of  attacking  this  problem.  The  one  used  in 
this  case  is  probably  not  very  worth  while,  but  the  result 
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would  appear  to  be  somewhat  interesting.  The  reliability  of 
the  ratings  on  the  46  behaviorgrams  from  a  college  annual  of 
1916  has  been  previously  listed.  Four  members  of  the  gradu- 
ating class  involved  were  asked  to  rank  the  46  men  according 
to  the  relative  degree  of  success  which  had  attended  their 
work  in  the  last  14  years.  Money  was  not  to  be  considered, 
nor  was  fame,  so  far  as  the  general  public  was  concerned,  but 
primarily  it  was  to  be  an  estimate  of  the  degree  to  which  each 
man  had  achieved  eminence  in  his  own  particular  field.  These 
four  rank  lists,  two  of  them  complete  and  the  other  two  not  so 
complete,  were  combined  by  assigning  percentile  rank  values 
to  each  of  the  46  men  on  each  of  the  four  lists  and  from  that 
computing  an  average  percentile  rank.  This  percentile  rank 
was  correlated  with  the  ratings  on  the  five  traits  and  the  com- 
posite rating  given  by  the  six  raters,  none  of  these  raters  be- 
ing men  involved  in  evaluating  the  degree  of  success.  The  re- 
sulting coefficients  were  as  follows:  Trait  I  .20  (.04)  II 
.40  (.05)  ;  III  .30  (.07)  ;  IV  .25  (.08)  ;  V  .30  (.07)  ;  Composite 
.87  (.06).  While  only  two  of  these,  the  coefficients  on  Trait  II 
and  the  Composite,  are  more  than  six  times  the  probable  error, 
yet  the  fact  that  all  are  plus  correlations  and  of  a  fair  size 
would  indicate  an  interesting  check  on  validity.  Considering 
the  possibilities  for  error  in  the  insufficiency  of  data  before  the 
people  who  rated  the  behaviorgrams  and  the  difficulty  of  eval- 
uating success,  these  figures  appear  rather  interesting. 


VIII.  Using  the  Scale 

Of  all  the  suggestions  made  for  safeguarding  the  accuracy 
of  the  rating  scale  the  experience  of  this  study  indicates  the 
most  valuable  to  be  the  raters'  adequate  acquaintance  with 
those  to  be  rated.  Apparently  membership  in  a  fraternal 
organization  provides  this  to  a  high  degree.  Apparently  refer- 
ences secured  from  freshmen  prior  to  their  arrival,  or  second- 
ary school  principals  and  teachers,  have  an  adequate  acquaint- 
ance. Apparently  instructors  of  freshmen  do  not  have  sufficient 
knowledge  of  their  students  at  the  end  of  a  quarter's 
work  together  to  furnish  ratings  which  will  be  satisfactorily 
reliable.  Instructors  omit  too  many  traits  on  their  ratings 
on  this  present  scale.  Possibly  a  scale  might  be  constructed 
which  would  be  more  acceptable  to  college  instructors.  This 
applies  particularly  to  traits  other  than  Trait  II.  No  special 
effort  was  made  at  any  point  in  this  experiment  to  train  rat- 
ers, because  it  was  desired  to  try  the  scale  out  under  the  condi- 
tions of  its  most  frequent  use.  Possibly  instructors,  given  a 
period  of  training,  could  use  this  scale  effectively. 

Evidence  obtained  in  this  study  indicates  the  great  likeli- 
hood that  a  scale  could  be  constructed  which  would  obtain 
from  secondary  schools  information  which,  supplemented  by 
psychological  test  scores,  would  make  possible  more  accurate 
prediction  of  scholastic  achievement  than  is  now  available 
from  other  sources.  For  this  purpose,  however,  the  quantity 
of  statistical  analysis  to  be  made  would  be  very  similar  to  that 
required  for  the  standardization  and  construction  of  tests 
themselves ;  this  sort  of  work  has  hardly  been  begun  on  rating 
scales  as  yet. 

While  the  educational  value  of  ratings  to  the  subjects  has 
been  pointed  out  repeatedly,  comparatively  little  has  been  said 
of  their  educational  value  to  the  raters.  The  many  very  il- 
luminating "behaviorgrams"  obtained  by  the  use  of  this  scale 
with  its  excellent  instructions  for  the  making  of  such  "be- 
haviorgrams"  would  indicate  that  it  would  serve  to  direct 
raters'  attention  to  significant  modes  of  behavior  on  the  part 
of  their  students.  It  would  seem  that  in  this  way  this  scale 
might  become  a  very  important  tool  in  developing  the  person- 
nel point  of  view  among  college  and  secondary  school  instruc- 
tors. 
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If,  as  was  suggested  by  some  of  the  later  quotations  from 
the  literature  on  scales,  the  next  step  in  the  development  of 
such  devices  will  follow  along  the  lines  of  the  early  work  on 
handwriting  scales  and  other  matching  methods,  this  scale 
itself  would  serve  in  part  to  collect  this  material.  More  than 
100  very  excellent  "behaviorgrams"  were  obtained  from  the 
ratings  on  107  freshmen  by  the  secondary  school  references. 
Samples  are  included  in  Appendix  C.  Ratings  on  these  sam- 
ples were  highly  reliable  as  previously  noted.  Possibly  scales 
constructed  from  these  samples  would  be  more  reliable  than 
this  scale  itself. 

In  any  event  two  sorts  of  statistical  studies  should  be  made 
in  the  future  use  of  this  scale.  In  the  first  place  any  institu- 
tion tising  the  scale  should  make  its  own  analyses  of  the  peculi- 
arities of  raters  and  the  significances  of  traits  for  their  own 
success  criteria.  In  the  second  place  some  central  agency 
should  distribute  these  scales  and  receive  the  results  for  tab- 
ulation and  standardization  on  a  large  scale.  A  standardized 
pencil  and  paper  test  needs  such  large  scale  statistical  correc- 
tion and  analysis.  How  much  more  is  this  the  case  in  a  device 
which  undertakes  to  measure  more  complex  and  variable  and 
less  well-objectified  factors.  From  the  indications  of  this  study 
it  is  at  any  rate  of  doubtful  value  for  any  office  to  use  this 
scale  for  record  purposes  without  a  very  careful  check  on  its 
significance  for  those  purposes  in  the  particular  situation  in 
which  it  is  used.  While  this  scale  seems  to  be  an  improvement 
on  previous  attempts  in  this  direction,  there  are  certainly  in- 
dications that  it  may  be  utterly  useless  and  a  sheer  waste  of 
time  unless  carefully  studied  under  the  particular  set  of  cir- 
cumstances involved  in  each  case.  All  of  the  general  principles 
listed  at  the  conclusion  of  the  historical  survey  prefixed  to  this 
study  are  thoroughly  pertinent  to  the  use  of  this  particular 
scale. 


IX.  Discussion  and  Summary  of  Conclusions 

The  American  Council  scale  seems  to  have  included  in  its 
construction  most  of  the  well-established  features  of  scale 
construction  since  Galton's  time.  A  family  tree  of  the  in- 
herited traits  in  this  scale  might  read  something  like  the  fol- 
lowing: From  Galton  the  concept  of  normal  distribution  of 
traits  in  a  population,  the  emphasis  on  achievement  rather 
than  static  traits,  and  the  method  of  matching  standard  de- 
scriptions ;  from  Miner  and  the  Scott  Company  the  statistical 
selection  of  traits  to  be  used,  and  the  dot-on-line  or  graphic 
representation;  from  Thorndike  the  effort  at  elimination  of 
halo  by  checking  inter-correlations  and  varying  directions  of 
the  optimum ;  from  Hollingworth  the  behaviorgram  or  narra- 
tion of  fact;  from  Rugg  realization  of  need  for  searching 
statistical  verification ;  from  Poff enberger  the  use  of  the  scale 
to  evaluate  data  already  gathered  in  other  ways. 

It  seems  to  the  writer  that  the  American  Council  scale  has 
greatly  improved  the  practice  in  the  field  of  college  rating  in 
three  respects :  the  rather  complete  shift  from  the  use  of  ad- 
jectives to  the  use  of  verbs,  the  intensification  of  the  effort  to 
get  pertinent  data  on  character  and  personality  by  making  the 
behaviorgram  a  fundamental  part  of  the  scale,  and  the  selec- 
tion of  an  adequate  classification  of  traits.  Furthermore,  it  is 
striking  to  notice  that  this  particular  scale  comes  nearer  the 
standard  procedures  inaugurated  by  Galton  than  does  any 
other  scale  since  his  time.  The  only  other  scale  which  has  been 
observed  as  near  it  in  this  respect  is  the  Scott  Company  scale 
for  industrial  use.  The  principal  source  of  deviation  in  the 
past  from  Galton's  methods  has  been  a  shift  of  emphasis  from 
observed  achievements  and  habits  to  inferred  unitary  faculties 
or  traits.  Another  fertile  source  of  confusion  has  been  the 
failure  to  realize  the  problems  involved  in  constructing  a  linear 
scale  the  units  of  measurement  of  which  should  have  compar- 
able significance.  What  were  really  mere  comparative  terms 
have  been  used  as  if  they  were  quantitative  units. 

The  evolution  of  the  rating  scale  is  in  some  way  strikingly 
like  that  of  the  test  program  in  which  most  workers  prior  to 
Binet  were  testing  the  number  of  supposedly  unitary  traits, 
faculties,  or  personal  qualities,  and  then  trying  to  find  a  com- 
bination of  these  which  would  measure  intelligence.     Binet, 
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of  course,  shifted  immediately  and  directly  to  a  test  of  the  total 
power  of  the  personality  to  meet  a  standardized,  objective, 
significant  situation.  It  is  the  writer's  belief  that  rating 
should  proceed,  for  the  time  being,  along  this  line. 

Thorndike  and  G.  W.  Allport  (4)  have  both  stated  em- 
phatically the  other  major  problem  involved  in  the  theory  of 
rating.  Thorndike's  observations  on  halo  (89)  as  producing 
much  of  the  inter-correlation  between  traits  and  his  observa- 
tions on  the  Fundamental  Theorems  (90)  of  judgment  are  per- 
tinent here.  In  the  latter  he  points  out  the  fact  that  two 
traits  may  not  unite  in  a  linear  way  to  produce  an  outcome, 
but  that  different  degrees  of  the  same  quality  may  greatly  af- 
fect the  way  in  which  the  other  trait  operates  so  that  we  may 
have  curvilinear  relationships  of  several  sorts.  Allport  in- 
sists that  the  unity  of  the  personality  cannot  be  dispensed 
with  as  a  dynamic  concept.  Not  only  does  the  amount  of  a 
certain  trait  or  tendency  have  to  be  reckoned  with  but  the 
amount  of  other  traits  and  the  "way  in  which  these  traits" 
are  integrated  in  the  particular  personality  under  study,  all 
these  factors  enter  into  the  responses  of  that  personality  to 
the  situations  of  life.  The  preceding  are  good  theoretical 
grounds  for  continuing  to  stress  habits,  conduct,  and  achieve- 
ment, rather  than  traits  or  faculties  in  the  construction  of 
scales. 

It  is  generally  agreed  that  personality  and  character  are 
most  typically  if  not  exclusively  manifested  in  social  situa- 
tions; that  is,  in  interactions  between  persons.  From  this  it 
would  appear  likely  that  the  subject's  actual  behavior  in  social 
situations  and  the  reaction  of  people  to  the  subject  are  the 
most  likely  means  of  measuring  character  and  personality  and 
their  changes.  Except  for  some  tests  of  emotional  abnor- 
mality, the  field  at  present  offers  few  objective,  standardized, 
pencil  and  paper  tests  of  personality  and  character  traits.  A 
more  promising  alternative  would  seem  to  be  securing  data 
from  acquaintances  concerning  a  subject's  behavior  and  their 
reaction  to  his  behavior  in  real  situations.  This  is  the  func- 
tion of  the  rating  scale.  Possibly  we  can  never  altogether  dis- 
pense with  it  in  this  field.  In  view  of  the  current  interest  in 
personality  development  and  character  training  as  parts  of 
the  educational  process,  work  to  improve  rating  scales  for 
such  traits  would  seem  to  have  been  worth  while.  The  Ameri- 
can Council  on  Education  Scale  combines  for  the  first  time 
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most,  if  not  all,  of  those  qualities  which  have  been  generally 
recommended  for  a  good  rating  scale.  Its  formulation  and  its 
further  refinement  through  use  would  appear  then  to  have 
been  a  beginning  of  a  real  contribution  towards  finding  ade- 
quate tools  to  assist  in  this  very  important  side  of  the  educa- 
tional process,  both  in  schools  and  in  industry. 


Summary 

1.  The  use  of  rating  scales  to  develop  a  more  scientific  un- 
derstanding of  personality  and  character  has  a  history  of  at 
least  sixty  years. 

2.  Between  1869  and  1916  there  appears  to  have  been  more 
retrogression  than  progress  in  the  understanding  with  which 
scales  were  constructed  and  used. 

3.  Miner's  Scale,  the  Scott  Company  Scale,  Freyd's  Graphic 
Scale  all  placed  the  development  of  the  scale  back  on  its  orig- 
inal track. 

4.  The  American  Council  Scale  includes  more  of  the  ac- 
cepted features  and  excludes  more  of  the  undesirable  ones 
than  others  examined  in  this  study. 

5.  Tlie  reliabilities  of  this  scale  are  as  high  as  any  previ- 
ously studied  intensively. 

6.  Its  validity  coefficients  are  at  least  suggestive  of  its  pos- 
sible value  in  college  personnel  administration. 

7.  It  is  of  doubtful  value  for  record  unless  raters  are 
trained,  their  variability  studied,  and  the  validity  of  ratings 
submitted  to  careful  check  in  each  situation  where  they  are  to 
be  used. 

8.  Further  development  of  this  and  similar  scales  would 
probably  be  best  stimulated  and  directed  if  "habits"  and 
"achievements"  were  still  more  frequently  substituted  for 
"traits,"  if  behaviorgrams  were  accumulated  and  evaluated, 
and  if  some  cooperative  provision  for  accumulating,  tabulat- 
ing and  statistically  evaluating  many  items  from  many  sources 
could  be  inaugurated ;  so  as  to  give  scales  somewhat  the  same 
treatment  as  at  present  recognized  as  necessary  for  tests. 


70 


BIBLIOGRAPHY 

1.  Adams,  H.  F.    An  Objectivity-Subjectivity  Ratio  for  Scales  of  Meas- 

urement.    Soc.  Psychol.,  1930,  1,  122-135. 

2.  Allport,  F.  H.  and  G.  W.     Personality  Traits:   Their  Classification 

and  Measurement.    J.  Abn.  Psychol.,  1921,  V.  16,  6-40. 

3.  Allport,  G.  W.     Personality  and  Character.  Psychol.  Bull.,  1921,  18, 

441-455. 

4.  Allport,  G.  "W.     The  Study  of  the  Undivided  Personality.     J.  Abn. 

Psychol.,  1924,  19,  132-141, 

5.  Allport,  G.  W.     Concepts  of  Trait  and  Personality.     Psychol.  Bull., 

1927,  24,  284-293. 

6.  Barr,  A.  S.    A  Study  in  Social  Rating.     Scho.  Home  Educ,  1921,  41. 

7.  Bingham  and  Freyd.    Procedures  in  Employment  Psychology.  N.  Y., 

1927. 

8.  Brogan,   A.   P.     A   Study   in   Statistical   Ethics.     The   Internat.   J. 

Ethics,  1923,  33,  119-135. 

9.  Brotemarkle,  R.  A.     A  Comparison  Test  for  Investigating  the  Idea- 

tional  Content  of  the   Moral   Concepts.     J.   App.   Psychol.,   1922, 
3,  235-242. 

10.  Cady,  V.  M.     The  Estimation  of  Juvenile  Incorrigibility.     J.  Delin., 

1923,  Mono.  No.  2. 

11.  Cattell,  J.  McK.     Homo  Scientificus  Americanus.  Si.,  1903,  17,  561. 

12.  Character    Education    Institution.      Character    Education    Methods, 

The  Iowa  Plan,  $20,000  Award.  1922,  5,  280-294. 

13.  Chassell,  C.  F.  and  Chassell,  E.  B.    A  Test  and  Teaching  Device  in 

Citizenship  for  Use  with  Junior  High  School  Pupils.     Educational 
Administration  and  Supervision,  1924,  10,  7-29. 

14.  Chassell,  C.  F.,  Chassell,  E.  B.,  Chassell,  L.  M.    A  Test  of  Ability  to 

"Weight  Foreseen  Consequences.    T.  C.  Rec,  1924,  25,  37-50. 

15.  Chassell,    Upton    and    Chassell.      Scales    for    Measuring    Habits    of 

Good  Citizenship.     T.  C.  Rec,  1922,  23. 

16.  Cleeton,  G.  V.  and  Knight,  F.  B.     Validity  of  Character  Judgments 

Based  on  External  Criteria.    J.  App.  Psychol.,  1924,  8,  215-231. 

17.  Colvin,  S.  S.     Principles  Underlying  the  Construction  and   Use  of 

Intelligence  Tests.     Twenty-First  Year  Book  of  the  National  So- 
ciety for  the  Study  of  Education. 

18.  Conklin,  E.  S.  and  Sutherland,  J.  W.    A  Comparison  of  the  Scale  of 

Value    Method    with    the    Order    of    Merit    Method.      J.    Exper. 
Psychol.,  1923,  6. 

19.  Downey,  J.     The  Will  Profile:     A  Tentative  Scale  for  Measurement 

of  the  Volitional  Pattern.     Univ.  Wyoming  Bull.,  1919,  1-40. 

20.  Downey,  J.    The  Will  Temperament  and  Its  Testing.  N.  Y.,  1923. 

21.  Dunlap,  K.     The  Reading  of  Character  from  External  Signs.  Scient. 

Mo.,  1922,  20,  133-165. 

22.  Filter,  R.  O.     An  Experimental  Study  of  Character  Traits.    J.  App. 

Psychol.,  1921,  5,  297-317. 

23.  Finklestein,  William  and  Williams,  J.  F.     Correlation  of  Efficiency 

Tests:  Preliminary  Report.     J.  Amer.  Med.  Asso.,  1922,  78,  1454- 
1455. 

24.  Franz,   S.  I.     Handbook  of  Mental  Examination  Methods.     N.  Y., 

1919. 

25.  Freyd.     The   Graphic   Rating   Scale.     J.   Educ.   Psychol.,   1923,   14, 

83-102 

26.  Furfey,  Paul.     An  Improved  Rating  Scale.     J.  Educ.  Psychol.,  17, 

45-48. 

27.  Galton,  F.     Hereditary  Genius.     Lond.,  1914. 

28.  Garrett,  H.  E.     An  Empirical  Study  of  Various  Methods  of  Com- 

bining  Incomplete   Order   of   Merit   Ratings.     J.   Educ.   Psychol., 

1924,  15,  157-172. 

71 


72     AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 

29.  Hackett,  J.  D.     Rating  Legislators.     Pers.  J.,  1928,  7,  130-131. 

30.  Haggerty,  M.  E.     Character  Education  and  Scientific  Method.     J. 

Educ.  Res.,  1926,  13,  233-248. 

31.  Hanna,  J.  V.     Variable  Factors  Encountered  in  the  Rating  of  Stu- 

dents.    School  and  Soc,  1925,  vol.  25. 

32.  Hart,  H.  N.     A  Test  of  Social  Attitudes  and  Interests.     Iowa  St. 

Child  Welfare,  vol.  2. 

33.  Hayes,  M.  H.  S.  and  Paterson,  D.  G.     Experimental   Development 

of  the  Graphic  Rating  Method.     Psychol.  Bull.,  1921,  18,  98-99. 

34.  Heidbreder,  E.     Experimental  Study  of  Thinking.     Arch.  Psychol., 

1924,  no.  93. 

35.  Hollingworth,   H.    L.      Experimental    Studies    in   Judgment.      Arch. 

Psychol.,  1913,  no.  29. 

36.  Hollingworth,  H.  L.    Judging  Human  Character.     N.  Y.,  1925. 

37.  Hollingworth,  H.  L.     Psycho-physical  continuum.     J.  Phil.,  1916,  13, 

182-219. 

38.  Hollingworth,  H.  L.     Psychology  of  the  Functional  Neuroses.  N  Y., 

1920. 

39.  Hughes,  W.  H.     A  Rating  Scale  for  Individual  Capacities,  Attitudes 

and  Interests.    J.  Educ.  Meth.,  1923,  3,  56-65. 

40.  Intelligence  and  Its  Measurement:     A  Symposium.      (1921).     Con- 

tributions by  E.  L.  Thorndike,  L.  M.  Terman,  F.  N.  Freeman, 
S.  S.  Colvin,  Rudolph  Pintner,  B.  Ruml,  S.  L.  Pressy,  V.  A.  C. 
Henmon,  Joseph  Peterson,  L.  L.  Thurstone,  Herbert  Woodrow, 
W.  F.  Dearborn,  M.  E.  Haggerty  and  B.  R.  Buckingham.  J.  Educ. 
Psychol.,  14,  123-148. 

41.  Kelley,  T.  L.     The  Principles  Underlying  the  Classification  of  Men. 

J.  App.  Psychol.,  1919,  3. 

42.  Kingsbury,  F.  A.    Analyzing  Ratings  and  Training  Raters.    J.  Pers. 

Res.,  1922,  1923,  I,  377-383. 

43.  Kingsbury,  F.  A.    Making  Rating  Scales  Work.    J.  Pers.  Res.,  1925, 

4,  1-6. 

44.  Knight    and    Franzen.      Pitfalls    in    Rating    Schemes.      J.    Educ. 

Psychol.,  1922,  13,  204-213. 

45.  Knight,  F.  B.     The  Effect  of  the  Acquaintance   Factor  upon   Per- 

sonal Judgments.    J.  Educ.  Psychol.,  14,  129-142. 

46.  Kohs,  S.  C.    Ethical  Discrimination  Tests.    J.  Del.,  1922,  9,  1-15. 

47.  Kornhauser,   A.   W.     Rating   Scales    (Various   Articles).     J.    Pers. 

Res.  5,  nos.  5,  8,  9,  11,  189-193,  309-317,  338-344,  440-446. 

48.  Laird,   Donald.     Psychology   of   Selecting   Men.     N.    Y.,   1927,   143, 

179,  345. 

49.  Landis,   C.     The  Justification   of  Judgments.     J.   Pers.   Res.,   1925- 

1926,  4,  7-19. 

50.  Manson,  Grace  E.     A  Bibliography  of  the  Analysis  and  Measure- 

ment of  Human  Personality  up  to  1926.  Reprint  and  Circular 
Series  of  the  Nat'l  Res.  Coun.  1926,  72.  59. 

51.  Marsh,  S.  E.  and  Perrin,  F.  A.  C.     An  Experimental  Study  of  the 

Rating  Scale  Technique.    J.  Abn.  Psychol.,  1925,  19,  383-399. 

52.  May,  M.  A.     What  Science  Offers  on  Character  Education.     Relig. 

Educ,  1928,  23,  566-583. 

53.  May,  M.  A.  and  Hartshorn,  H.     Objective  Methods  of  Measuring 

Character.    Ped.  Sem.,  1925,  32. 

54.  Meier,  N.  C.     A  Study  of  the  Downey  Test  by  the  Method  of  Esti- 

mates.    J.  Educ.  Psychol.,  1923,  14,  385-394. 

55.  Miner,  J.   B.     The   Evaluation  of  a  Method  for  Finely  Graduated 

Estimates  of  Abilities.    J.  App.  Psychol.,  1917,  123. 

56.  Norsworthy,  N.     The  Validity  of  Judgments  of  Character.     Essays 

in  Honor  of  William  James,  1910   (2d  Edition),  542-552. 

57.  Paterson,  D.  G.     Methods  of  Rating  Human  Qualities.     Ann.  Amer. 

Acad.  Pol.  Soc.  Sci.,  1923,  110,  81-93. 

58.  Paterson,  D.  G.    The  Graphic  Rating  Scale.    J.  Pers.  Res.,  1922-23, 

1,  361-370. 


AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE      73 

59.  Payne,  E.  G.     Education  in  Health.     Lyons  &  Corrahan,  1921. 

60.  Payne,  E.  G.     A  Scale  of  Measuring  Habits  and  Practices  in  Health 

and  Accident  Prevention.    Sch.  Soc,  1923,  17,  25-28. 

61.  The  Personnel  System  of  the  U.  S.  Army,  Section  on  Rating  Scales. 

Government  Printing  Office,  Washington,  D.  C.  2  volumes. 

62.  Plant,  J.  S.     Rating  Scheme  for  Conduct.     Amer.  J.  Psychiat.,  1922, 

1,  547-572. 

63.  Poffenberger,  A.   T.     Measures   of   Intelligence  and   Character.     J. 

Phil.,  1922,  19,  261-266. 

64.  Poffenberger,  A.  T.     A  Critical  Examination  of  the  Usual  Employ- 

ments Methods.     Ann.  Amer.  Acad.,  1923,  110,  13-21. 

65.  Poffenberger  and  Carpenter.     Character  Traits  in  School  Success.  J. 

Exper.  Psychol.,  1924,  7,  67-74. 

66.  Porteus,  S.  D.     A  Study  of  Personality  of  Defectives  with  a  Social 

Rating  Scale.     Publ.  Vineland  Train.  Sch.,  1920,  23. 

67.  Pressey,  S.  L.     A  Group  Scale  for  Investigating  the  Emotions,     J. 

Abn.  Psychol.,  1921,  16,  55-64. 

68.  Pressey,   S.   L.     "Cross-Out"   Tests,  with    Suggestions   as  to   Group 

Scale  of  the  Emotions.    J.  App.  Psychol.,  1919,  3,  138-150. 

69.  Pressey,  S.  L.  and  L.  W.    First  Revision  of  a  Group  Scale  Designed 

for  Investigation  of  the  Emotions  with  Tentative  Norms.    J.  App. 
Psychol.,  1920,  4,  97-104. 

70.  Remmers,  H.  H.     Empirical  Study  of  the  Spearman-Brown  Formula 

as  applied  to  the  Purdue  Rating  Scale.     J.  Educ.  Psychol.,  1927, 
18,  187-195. 

71.  Roback,  A.  A.    A  Bibliography  of  Character  and  Personality.     Cam- 

bridge, Mass.,  1927. 

72.  Roback,    A.    A.      The    Psychology   of    Character   with    a    survey   of 

Temperament.     N.  Y.,  1927. 

73.  Rogers,  Agnes  L.     A  Tentative  Inventory  of  Habits  Issued  by  the 

Department  of  Kindergarten  First-grade  Education  of  Teachers 
College.     T.  C.  Bull.,  1922,  14th  Series. 

74.  Ruch,  G.  N.     A  Preliminary  Study  of  the  Correlation  between  Esti- 

mates of  Volitional  Traits  and  the  Results  from  the  Downey  Will 
Profile.    J.  App.  Psychol.,  1921,  5,  159-162. 

75.  Ruch,  G.  N.  and  Del  Manzo,  M.  C.     The  Downey-will  Temperament 

Group  Test;  a  further  Analysis  of  its  reliability  and  validity.    J. 
App.  Psychol.,  1923,  7,  65-76. 

76.  Rugg,  H.  O.     Is  the  Rating  of  Human  Character  Practicable?     J. 

Educ.  Psychol.,  1921,  1922,  12,  425-438,  485-501;  13,  30-42,  81-93. 

77.  Scott,  W.  D.     The  Rating  Scale.     Psychol.  Bull.,  1918,  15,  203-206. 

78.  Shelton,  W.  H.    Social  Traits  and  Morphologic  Types.    J.  Pers.  Res., 

1927-28,  6,  47-55. 

79.  Shen,  E.     Influence  of  Personal  Friendship  Upon  Ratings.     J.  App. 

Psychol.,  1925,  9,  65. 

80.  Shuttleworth,  F.  F.     The  Measurement  of  the  Character  and  En- 

vironmental   Factors    Involved    in    Scholastic    Success.      Iowa    St. 
Character,  1927,  1,  80. 

81.  Shuttleworth,  F.  K.    A  New  Method  of  Measuring  Character  Traits. 

Sch.  Soc,  1924,  19,  679-682. 

82.  Slawson,  J.     The  Reliability  of  Judgments  of  Personal  Traits.     J. 

App.  Psychol.,  1922,  6,  161-171. 

83.  Spearman,  C.     The  Abilities  of  Man.    N.  Y.,  1927. 

84.  Starbuck,  E.  M.    Tests  and  Measurements  of  Character.   Nat'l  Educ. 

Ass.  Proc.,  1924,  62. 

85.  Symonds,  P.  M.     The  Present  Status  of  Character  Measurement.    J. 

Educ.  Psychol.,  1924,  15,  484-498. 

86.  Symonds,  P.  M.    On  the  Loss  of  Reliability  in  Ratings  Due  to  Coarse- 

ness of  the  Scale.    J.  of  Exper.  Psychol.,  1924,  7,  456-461. 

87.  Terman,   L.   M.     Genetic   Studies  of   Genius.     1925,  vol   2,   chaps. 

XVII  and  XVIII. 

88.  Thorndike,  E.  L.    Educational  Psychology.    Vol  1. 


74    AMERICAN  COUNCIL  ON  EDUCATION  RATING  SCALE 

89.  Thorndike,  E.  L.     A  Constant  Error  in  Psychological  Rating.     J. 

App.  Psychol.,  1920,  4,  25-29. 

90.  Thorndike,  E.  L.     Fundamental  Theorems  in  Judging  Men.    J.  App. 

Psychol,  1918,  2,  67-76. 

91.  Thurstone,  L.  L.     Attitudes  Can  be  Measured.     Amer.  J.  Soc,  1928, 

33,  529-554. 

92.  Travis,  R.  C.     The  Measurement  of  Fundamental  Character  Traits 

by  a  New  Diagnostic  Test.     J.  Abn.  Psychol.,  1924,  19,  400-420. 

93.  Upton,  M.  A.  and  Chassell,  C.  F.    A  Scale  for  Measuring  the  Impor- 

tance of  Habits  of  Good  Citizenship.     T.  C.  Rec,  1919,  20,  T.  C. 
Bull.,  1921,  12th  Series. 

94.  Vernerka,  M.  Madilene.     A  Habit  Curriculum  for  the  Four-year-old 

or  thereabouts.    J.  Educ.  Meth.,  1923,  3,  121-123. 

95.  Voelker,  P.  F.     The  Function  of  Ideals  and  Attitudes  in  Social  Edu- 

cation; An  Experimental  Study.    T.  C.  Cont.  Educ,  no.  112. 

96.  Walters,  J.  E.     Description  of  Procedures  of  the  Personnel  System 

for  the  Schools  of  Engineering,  Purdue  University.     J.  Engineer- 
ing Education,  New  Series,  vol.  18,  no.  5.  Jan.  1928. 

97.  Watson,  G.  B.     A  Supplementary  Review  of  Measures  of  Personal- 

ity Traits.    J.  Educ.  Psychol,  1927,  18,  73-87. 

98.  Webb,    E.      Character    and    Intelligence.      Brit.    Psychol.,    Monog., 

1915,  1,  no.  3. 


Appendix  A 
Sample  of  form  letters  sent  out  to  the  references  offered  by- 
applicants  for  admission  to  the  University  of  North  Carolina. 
With  these  letters  were  sent  either  Scales  II,  IV,  V  or  VI. 

THE  UNIVERSITY  OF  NORTH  CAROLINA 
Chapel  Hill,  N.  C. 
office  of 
dean  of  students 

November  23,  1929 
Dear  : 

Mr.  ,  a  member  of  the  Freshman  Class  here,  has  indicated  to 

us,  at  our  request,  that  you  have  known  him  well  enough  prior  to  his 
coming  here  to  describe  some  of  his  personal  characteristics.  The  Uni- 
versity is  cooperating  in  a  piece  of  nation-wide  research  to  discover,  if 
possible,  better  methods  of  dealing  with  students  as  individuals.  A  part 
of  this  project  has  to  do  with  the  use  of  what  are  known  as  "Rating 
Scales."  I  am  enclosing  a  copy  of  such  a  scale  on  which  I  will  ask  you 
to  rate  the  young  man  in  question  as  accurately  as  possible.  The  scale 
is  self-explanatory,  but  I  will  be  very  glad  to  answer  any  questions  that 
still  remain  in  your  mind  after  reading  its  contents.  Please  use  the  re- 
turn stamped  envelope  to  send  in  the  scale  filled  at  your  earliest  conven- 
ience. 

Very  cordially  yours, 

Francis  F.  Bradshaw 
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Appendix  B 
Samples  of  letters  sent  to  members  of  the  University  of 
North  Carolina  faculty.  The  first  of  these  got  very  little  re- 
sponse. The  second  was  used  as  a  runner  up.  Attached  to 
each  of  these  letters  was  a  list  of  the  men  involved  in  each  of 
that  man's  classes.  A  scale  for  each  of  these  men  was  enclosed. 
Scales  II,  IV,  V  and  VI  were  used. 

Chapel  Hill,  N.  C. 
December  23,  1929. 
Dear  Sir: 

This  office  is  cooperating  with  similar  personnel  offices  at  Stanford 
University  and  the  University  of  Minnesota  in  a  trial  of  the  American 
Council  personality  rating  scale.  A  part  of  this  experiment  consists  in 
comparing  ratings  given  freshmen  by  their  secondary  school  instructors 
with  those  given  the  same  freshmen  by  their  college  instructors  during 
the  first  term.  We  have  already  secured  such  ratings  from  the  second- 
ary school  people  on  about  100  freshmen.  I  am  enclosing  scales  for  such 
of  these  freshmen  as  have  been  in  your  classes  during  this  quarter. 

Would  you  be  good  enough  to  fill  them  out  as  completely  as  possible 
and  return  them  to  this  office  in  the  enclosed  envelope  as  soon  as  may  be 
convenient  to  you.  If  you  need  any  additional  explanation  of  the  scale 
or  the  experiment,  I  shall  be  glad  to  answer  any  such  inquiries. 

Cordially  yours, 

Francis  F.  Bradshaw 

THE  UNIVERSITY  OF  NORTH  CAROLINA 

Chapel  Hill,  N.  C. 
office  of 
dean  of  students 

February  7,  1930. 
Dear  Sir: 

I  am  enclosing  rating  scales  for  some  of  the  freshmen  being  instructed 
by  you  this  quarter  according  to  the  records  of  the  Registrar's  office.  May 
I  ask  the  favor  that  you  check  at  the  appropriate  points  on  these  scales 
and  return  them  at  your  earliest  convenience  to  this  office  in  the  en- 
closed addressed  envelope.  This  is  not  a  regular  part  of  the  routine  of 
this  office  nor  is  it  about  to  become  so,  but  is  a  special  task  which  this 
office  has  undertaken  in  cooperation  with  other  institutions  and  the 
American  Council  on  Education.  I  should  appreciate  your  cooperation 
in  this  matter. 

If  the  request  is  not  self-explanatory  after  you  read  the  scales,  be 
sure  to  let  me  explain  it  further. 

Most  cordially  yours, 

Francis  F.  Bradshaw 
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Appendix  C 
Samples  of  behaviorgrams  received  from  secondary  school 
raters.     These  were  selected  from  about  300.     They  are  not 
all  on  the  same  subject,  but  are  picked  from  the  entire  group. 

TRAIT  A 

1.  He  is  liked  by  some  and  they  are  the  ones  that  he  expected  some- 
thing in  return  from.  He  was  not  always  willing  to  do  for  others,  such 
as:  Sometimes  it  was  hard  for  a  number  of  the  Seniors  to  get  to  some 
activities  at  night  and  he  would  not  use  his  car  to  help  them. 

2.  I  observed  that fellow  students  disliked  his  somewhat  too  great 

self-assurance.  He  had  not  learned  to  make  himself  a  member  of  a 
group  when  he  left  nere.  He  had  an  overpowering  ambition.  Nothing 
too  hard  for  him. 

3.  "A  likable  boy,  but  has  an  over  supply  of  brass."  A  typical  re- 
mark heard  of  various  teachers. 

4.  I  feel  that  perhaps  "tolerated"  is  not  quite  the  word  to  be  used 

here,  but was  certainly  not  a  universal  favorite  among  the  pupils 

of High  School.     He  has  been  in  my  classes  for  three  years  and 

was  in  my  homeroom  last  year. 

5.  Was  wanted  by  several  for  a  roommate.     Although  a  member  of  a 

group  who  had  known  each  other  before  they  came  to he  quickly 

became  well-known  and  well-liked  by  a  majority  of  the  school. 

6.  The  home,  when  he  is  present,  takes  on  a  festive  atmosphere.  The 
ringing  of  the  phone  and  doorbell  and  cars  parked  in  the  driveway  attest 
to  his  agreeable  personality. 

7.  My  mental  picture  of  him  is  that  of  a  boy  who  nearly  always  had 
one  or  two  companions  vdth  him  with  whom  he  was  talking  in  an  un- 
usually pleasant,  cheerful  way.  He  was  regularly  invited  to  all  parties 
by  boys  and  girls.  He  was  constantly  willing  and  really  glad  to  take 
people  about  in  his  automobile. 

8.  Older  people  often  remark  on  his  manners  and  charm.  The  teachers 
in  the  high  school  admired  him.  He  was  a  general  favorite.  The 
rougher  element  among  the  boys  at  first  made  fun  of  him,  because  the 
girls  all  thought  him  so  handsome,  but  he  won  their  respect  by  sheer 
indifference  to  their  teasing  and  by  hard  work  on  the  football  team. 

TRAIT  B. 

1.  Mr.  was  probably  the  most  discouraging  and  at  times  ir- 
ritating student  I  have  ever  had,  much  of  the  time  refusing  to  do  his 
work.  His  teachers  kept  him  after  school  many  times.  We  all  used 
every  incentive  we  could  think  of.  In  the  end  he  had  to  be  tutored  in 
the  summer  after  he  should  have  graduated  in  order  to  receive  his  di- 
ploma. 

2.  He  told  me  that  his  high  school  record  was  poor,  because  he  had 

rarely  studied.     While  at he  studied  very  conscientiously,  but  on 

rare  occasions  his  old  habits  reasserted  themselves  and  a  poor  mark 
or  a  word  of  advice  from  some  one  else  was  necessary  to  get  him  to 
serious  work  again. 

3.  The  father  was  interviewed  several  times  to  stimulate  the  boy  to 
put  forth  satisfactory  effort.  Did  not  arouse  himself  to  really  work  hard 
for  college  admission  until  the  senior  year  was  partly  over. 

4.  Became  restless  last  spring  and  began  flunking  all  subjects.  Said 
he  had  given  up  college;  didn't  know  why.  Discussed  himself,  good  and 
bad  points  impersonally  for  an  hour.  Result:  He  went  back  to  work 
again;  decided  to  go  to  college.  Has  a  brilliant  mind,  but  hates  pro- 
longed use  of  it. 
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5.  Work  assigned  by  me  was  always  done  promptly.  He  did  not  seem 
to  have  the  habit  of  procrastination  so  generally  found  in  high  school 
students.     He  was  efficient  in  what  he  did. 

6.  In  his  course  in  chemistry  last  year  he  analyzed  the  water  from 
some  springs  in  the  community,  also  some  mineral  deposits  found  near 
by.  He  also  entered  a  national  essay  contest  sponsored  by  the  American 
Chemical  Society. 

7.  Although  now  a  student  at  U  N.  C.  he  has  contributed  several  ar- 
ticles to  the  high  school  magazine.  He  voluntarily  entered  two  literary 
contests  last  year  and  won  honorable  mention  in  one  of  them.  For  two 
years  he  read  "extra"  parallel  and  looked  up  "extra"  reports. 

8.  He  made  slide  rule  and  studied  its  theory  and  practice  of  his  own 
accord.     Conducts  experiments  in  electricity  in  his  own  home. 

TRAIT  C 

1.  He  is  not  an  executive  often  doing  all  things  more   quickly 

than  he  could  get  others  to  do  them.     He  held  numerous  high   school 
offices,  however. 

2.  Genial,  pleasant  and  well  liked,  but  not  a  leader,  although  his  com- 
pany was  much  sought  by  a  group  of  boys.  Very  pleasantly  disposed  and 
a  little  too  easy-going.  In  one  important  instance  he  was  led  into  serious 
difficulty,  we  believed,  because  it  was  easier  to  go  along  rather  than  to 
definitely  assert  himself  and  be  independent. 

3.  Not  a  leader.  Only  senior  boy  in  school,  and  playing  on  two  athletic 
teams — not  mentioned  for  captains  of  either.  Elected  to  very  few  of- 
fices in  school,  but  serves  well  in  such  offices. 

4.  Never  aspired  to  class  office  and  resigned  when  nominated  by  it, 
due  not  to  lack  of  ability  nor  timidity,  but  to  "rather  not  be  bothered" 
attitude. 

5.  Could  have  held  many  offices,  but  didn't  care  to  bother. 

6.  He  usually  managed  to  get  others  to  do  what  he  wished.  He  seemed 
successful  as  class  officer — especially  planning  and  carrying  out  Junior- 
Senior  Reception. 

7.  Whenever  I  desired  to  have  a  thing  done  and  done  well,  Mr. 

was  given  the  task.     He  was  enthusiastic  and  willing  to  cooperate. 

8.  His  presence  in  the  high  school  band  and  orchestra  seemed  to  in- 
spire otherwise  dull  performers.  Since  his  departure  they  have  accom- 
plished very  little.  Last  summer  he  organized  and  directed  an  orchestra 
of  his  own  with  marked  success.  Without  his  leadership,  it  has  dis- 
banded. 

9.  Last  year  he  wrote  and  produced  an  operetta  in  high  school.  He 
was  able  to  get  students  to  rehearsals  when  others  would  have  failed. 
Almost  every  detail  came  under  his  direct  supervision. 

TRAIT  D 

1.  Worried  unduly  by  loss  of  little  things, — a  book,  cap,  or  track  shoe. 
Too  easily  depressed  by  apparently  unfavorable  prospects.  Disposition, 
as  a  rule,  sunny  and  likeable.  Needs  to  know  that  he  is  more  stable 
than  he  thinks  he  is. 

2.  His  interests  were  centered  too  much  on  his  studies.  He  took  very 
little  interest  in  athletics  and  extra-curricular  activities,  probably  partly 
due  to  his  reticence.  He  seemed  to  shrink  from  anything  that  would 
bring  him  before  the  eyes  of  others. 

3.  When  a  test,  or  paper  of  any  kind,  was  returned  to  him  with  a 
poor  grade,  instead  of  showing  a  determination  to  do  better,  he  almost 
invariably  came  in  to  see  if  he  could  drop  the  course.  On  examination 
he  seemed  very  nervous. 

4.  I  have  seen  him  turn  pale  with  anger  at  someone's  remark. 
Usually  his  sense  of  humor  comes  to  the  rescue.     Is  somewhat  cynical 

I'  I  ^^^^^'  partly  to  the  snobbish  social  life  of  his  home  town, 

5.  Never  observed  him  to  be  excited  nor  enthusiastic  nor  especially 
opposed  to  any  movement.     Lukewarm,  I'd  say. 
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6.  He  was  a  truck  driver  in  our  school  last  year;  on  cold  days  when 
the  truck  would  sometimes  take  thirty  minutes  to  start  he  would  lose  his 
temper,  and  sometimes  would  actually  shed  tears. 

7.  Is  likely  to  be  unpleasant,  if  he  can't  have  his  own  way.  I  saw  this 
illustrated  when  he  couldn't  get  some  permission  he  was  very  anxious 
to  have. 

8.  A  little  inclined  to  fight  too  easily  and  knock  some  fellow  down, 
but  uses  his  fists. 

9.  When  things  went  wrong  and  others  got  excited,  he  just  laughed 
and  often  made  some  humorous  remark  that  cleared  the  air. 

10.  When  he  was  deliberately  knocked  down  in  a  match  basketball 
game  one  night,  he  was  very  much  aroused,  but  he  controlled  himself 
well  enough  not  to  talk  or  fight.  When  any  misfortune  happened  or 
something  very  comical,  he  very  seldom  made  any  jeers  or  unseeming 
gestures. 

11.  Not  easily  swayed  by  emotions  of  the  crowd.  Has  unusual  moral 
courage,  and  will  do  what  he  thinks  right — the  more  remarkable  since 
he  is  younger  than  his  fellows. 

12.  In  face  of  dislike  on  the  part  of  some  of  his  classmates  he  never, 
to  my  knowledge,  lost  control  of  his  emotions,  although  taunted.  Calm 
and  collected  during  stress  of  first  dramatic  performances. 

TRAIT  E 

1.  Have  often  heard  him  make  that  expression  "Just  get  by."  Catch 
his  interest  though,  and  he  will  apply  himself  vigorously.  He  needs  to 
be  oriented. 

2.  He  would  like  to  succeed,  but  just  doesn't  know  what  he  wants  to 
do.    Wonders  what  will  turn  up. 

3.  In  answer  to  this  question  my  thought  would  fall  mid-way  between 
"Has  vaguely  formed  objectives"  and  "Aims  just  to  get  by."  In  some 
of  his  work  at  school  he  occasionally  handed  in  really  good  exercises  and 
during  the  latter  part  of  the  senior  year  his  desire  to  enter  college  took 
him  out  of  the  "Get  by"  class.  I  believe  the  boy  has  more  good  in  him 
than  our  school  brought  out.  He  has  had  an  easy  life  and  needs  to  be 
free  from  home  pampering. 

4.  He  was  very  poor  in  French  and  was  receiving  very  discouraging 
marks.  He  had  a  newly  formed  desire  to  go  to  some  college  and  he 
persisted  in  his  French  and  improved  remarkably.  He  rejoiced  in  the 
best  marks  he  had  ever  had  in  all  subjects.  Yet  his  incentive  was  a 
vague  desire  for  college. 

5.  He  knows  what  he  wants  to  get  in  life,  but  will  not  make  extreme 
efforts  in  its  direction,  especially  if  they  should  interfere  with  his  pleas- 
ure. 

6.  He  had  a  definite  program  which  allotted  time  for  preparation  for 
classroom  work,  definite  time  for  athletics  and  made  a  good  selection  of 
other  extra-curricular  activities.     He  would  have  no  overlapping. 

7.  He  always  did  his  school  work  on  time  and  was  keeping  up  with 
music  and  play  rehearsals  too.  Several  years  ago  he  made  up  his  mind 
to  be  a  doctor  and  with  this  in  view  he  spends  much  time  at  the  local 
hospital  or  making  calls  with  a  doctor.  (I  have  had  no  opportunity  to 
observe  his  program  of  activities,  but  am  judging  from  results.) 

8.  He  seemed  to  always  be  eager  to  get  his  school  work  done  before 
participating  in  other  activities.  His  idea  when  in  school  seemed  to  be, 
that  he  desired  a  college  education,  so  that  he  might  become  a  combina- 
tion of  Coach  and  Teacher. 

9.  He  is  to  study  law.  He  turned  the  senior  class  night  into  a  criminal 
court  program  entirely  different  from  the  usual  class  night  program. 
His  whole  thought  is  on  the  objective  he  has  set  up. 
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Appendix  D 

Sample  of  personal  writeups  taken  from  a  college  annual 
for  1916. 

Age  20  Weight  155  Height  6  ft.  V2  inch. 

Phi  Society;  Class  Baseball  (1)  ;  Scrub  (2,  3)  ;  Manager  Class  Base- 
ball (4)  ;  Class  Football  (4)  ;  All-Class  (4)  ;  German  Club;  Coop;  D.  K. 
E. 

Being  a  good,  steady  student  is  where  "Fred"  shines.  He  has  a  way 
of  dodging  laboratories,  but  he  does  his  other  work  well.  Being  a  stu- 
dent doesn't  make  "Fred"  over-serious,  for  he  takes  life  calmly,  with 
a  quiet  happy  way,  and  never  lets  things  worry  him.  In  athletics  he 
starred  as  an  All-Class  end  and  baseball  scrub. 
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